Enron Email Dataset

Public

Profile

Title of the dataset	Enron Email Dataset
Provenance of the dataset	http://www.cs.cmu.edu/~enron/
How were the data collected/created? What was the cost?	This dataset was collected and prepared by the CALO Project, then was later purchased by Leslie Kaelbling at MIT, and turned out to have a number of integrity problems. A number of folks at SRI, notably Melinda Gervasio, worked hard to correct these problems.
Data sharing policy	Other
Data sharing policy

Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates.	text
Variable labels of dataset (the names of the variables)	CONTENT (EMAIL)\|FROM (EMAIL)\|TO (EMAIL)
Outline of data	This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.
Simulation process
Expected outcome of the process (obtained knowledge, analysis results, output of tools)
Anticipation for analyses/simulations other than the typical ones provided above