ClueWeb12
Last Update:November,9,2015 Created:November,9,2015
Comment
Like
Favorite
Public
Profile
Title of the dataset | ClueWeb12 |
---|---|
Provenance of the dataset | http://www.lemurproject.org/clueweb12.php/ |
How were the data collected/created? What was the cost? | The ClueWeb12 datasets are distributed by Carnegie Mellon University for research purposes only. A dataset may be obtained from Carnegie Mellon by signing a data license agreement with Carnegie Mellon University, and paying a fee that covers the cost of distributing the dataset. |
Data sharing policy | Other |
Data sharing policy |
About data analysis and simulation
Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. | text |
---|---|
Variable labels of dataset (the names of the variables) | |
Outline of data | The ClueWeb12 dataset was created to support research on information retrieval and related human language technologies. The dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset. Distribution of ClueWeb12 began in January 2013. |
Simulation process | |
Expected outcome of the process (obtained knowledge, analysis results, output of tools) | |
Anticipation for analyses/simulations other than the typical ones provided above |
Other
Comments | |
---|---|
What kind of data/tools do you wish to have? | |
Visualized information | |
Sample data |
Comment form