Subjectivity datasets of Movie Review Data

  • Last Update:November,24,2015 Created:November,24,2015
  • Comment
  • Like
  • Favorite



Title of the dataset Subjectivity datasets of Movie Review Data
Provenance of the dataset
How were the data collected/created? What was the cost? The starting points for data acquisition were snippets of movie reviews from Rotten Tomatoes ( and plot summaries for movies from the Internet Movie Database ( The html files under the folder subj are webpages containing Rotten Tomatoes snippets; the html files under the folder obj are pages containing IMDb plot summaries.
Data sharing policy Other
Data sharing policy

About data analysis and simulation

Type of data: Check all that apply. Use "Other" to specify other types so that we can include them in further updates. text
Variable labels of dataset (the names of the variables) OBJECTIVE SENTENCES(TEXT)|SUBJECTIVE SENTENCES(TEXT)
Outline of data This data is distributed as movie-review data for use in sentiment-analysis experiments, which includes 5000 subjective and 5000 objective processed sentences. Introduced in Pang/Lee ACL 2004 (Released June 2004). Available are collections of movie-review documents labeled with respect to their overall sentiment polarity (positive or negative) or subjective rating (e.g., two and a half stars) and sentences labeled with respect to their subjectivity status (subjective or objective) or polarity.
Simulation process sentiment classification/summarization/sentiment categorization/sentiment polarity/rating evaluation
Expected outcome of the process (obtained knowledge, analysis results, output of tools) polarity of reviews/summarized reviews/clusters/categories
Anticipation for analyses/simulations other than the typical ones provided above


Comments Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up? Sentiment Classification using Machine Learning Techniques, Proceedings of EMNLP 2002. Bo Pang and Lillian Lee, A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts, Proceedings of ACL 2004. Bo Pang and Lillian Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, Proceedings of ACL 2005.
What kind of data/tools do you wish to have?
Visualized information
Sample data

Comment form


Please check the terms of use here.