ClueWeb12

  • Last Update:November,9,2015 Created:November,9,2015
  • Comment
  • Like
  • Favorite

Public

Profile

Title of the dataset ClueWeb12
Provenance of the dataset http://www.lemurproject.org/clueweb12.php/
How were the data collected/created? What was the cost? The ClueWeb12 datasets are distributed by Carnegie Mellon University for research purposes only. A dataset may be obtained from Carnegie Mellon by signing a data license agreement with Carnegie Mellon University, and paying a fee that covers the cost of distributing the dataset.
Data sharing policy Other
Data sharing policy

About data analysis and simulation

Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. text
Variable labels of dataset (the names of the variables)
Outline of data The ClueWeb12 dataset was created to support research on information retrieval and related human language technologies. The dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset. Distribution of ClueWeb12 began in January 2013.
Simulation process
Expected outcome of the process (obtained knowledge, analysis results, output of tools)
Anticipation for analyses/simulations other than the typical ones provided above

Other

Comments
What kind of data/tools do you wish to have?
Visualized information
Sample data

Comment form

captcha

Please check the terms of use here.

関連するトピック

関連するトピックはありません。