• Last Update:November,9,2015 Created:November,9,2015
  • Comment
  • Like
  • Favorite



Title of the dataset ClueWeb12
Provenance of the dataset
How were the data collected/created? What was the cost? The ClueWeb12 datasets are distributed by Carnegie Mellon University for research purposes only. A dataset may be obtained from Carnegie Mellon by signing a data license agreement with Carnegie Mellon University, and paying a fee that covers the cost of distributing the dataset.
Data sharing policy Other
Data sharing policy

About data analysis and simulation

Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. text
Variable labels of dataset (the names of the variables)
Outline of data The ClueWeb12 dataset was created to support research on information retrieval and related human language technologies. The dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset. Distribution of ClueWeb12 began in January 2013.
Simulation process
Expected outcome of the process (obtained knowledge, analysis results, output of tools)
Anticipation for analyses/simulations other than the typical ones provided above


What kind of data/tools do you wish to have?
Visualized information
Sample data
  • ※知識グラフ:このデータに関するソリューション・要求および関連するデータの関係性を示すグラフ

Comment form


Please check the terms of use here.