Enron Email Dataset

  • Last Update:December,24,2013 Created:December,24,2013
  • Comment
  • Like
  • Favorite



Title of the dataset Enron Email Dataset
Provenance of the dataset http://www.cs.cmu.edu/~enron/
How were the data collected/created? What was the cost? This dataset was collected and prepared by the CALO Project, then was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems.
Data sharing policy Other
Data sharing policy

About data analysis and simulation

Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. text
Variable labels of dataset (the names of the variables) CONTENT (EMAIL)|FROM (EMAIL)|TO (EMAIL)
Outline of data This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
Simulation process
Expected outcome of the process (obtained knowledge, analysis results, output of tools)
Anticipation for analyses/simulations other than the typical ones provided above


What kind of data/tools do you wish to have?
Visualized information
Sample data

Comment form


Please check the terms of use here.