Title of the dataset Enron Email Dataset
Provenance of the dataset http://www.cs.cmu.edu/~enron/
How were the data collected/created? What was the cost? This dataset was collected and prepared by the CALO Project, then was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems.
Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. text
Variable labels of dataset (the names of the variables) CONTENT (EMAIL)|FROM (EMAIL)|TO (EMAIL)
Outline of data This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
