Untitled

Sentiment Classification using Machine Learning Techniques
Pranjal Vachaspati
pranjal@mit.edu
Cathy Wu
cathywu@mit.edu
Abstract
We implement a series of classifiers (Naive Bayes, Max-
imum Entropy, and SVM) to distinguish positive and nega-
tive sentiment in critic and user reviews. We apply various
processing methods, including negation tagging, part-of-
speech tagging, and position tagging to achieve maximum
accuracy. We test our classifiers on an external dataset to
see how well they generalize. Finally, we use a majority-
voting technique to combine classifiers and achieve accu-
racy of close to 90% in 3-fold cross-validation, far outper-
forming Pang’s 2002 work [7].
1. Introduction
Sentiment analysis, broadly speaking, is the set of tech-
niques that allows detection of emotional content in text.
This has a variety of applications: it is commonly used by
trading algorithms to process news articles, as well as by
corporations to better respond to consumer service needs.
Similar techniques can also be applied to other text analysis
problems, like spam filtering.
The source code described in this paper is available at
https://github.com/cathywu/Sentiment-Analysis.
2. Previous Work
We set out to replicate Pang’s work [7] from 2002 on
using classical knowledge-free supervised machine learn-
ing techniques to perform sentiment classification. They
used the machine learning methods (Naive Bayes, maxi-
mum entropy classification, and support vector machines),
methods commonly used for topic classification, to explore
the difference between and sentiment classification in doc-
uments. Pang cited a number of related works, but they
mostly pertain to classifying documents on criteria weakly
tied to sentiment or using knowledge-based sentiment clas-
sification methods. We used a similar dataset, as released
by the authors, and made efforts to use the same libraries
and pre-processing techniques.
In addition to replicating Pang’s work as closely as we
could, we extended the work by exploring an additional
dataset, additional preprocessing techniques, and combin-
ing classifiers. We tested how well classifiers trained on
Pang’s dataset extended to reviews in another domain. Al-
though Pang limited many of his tests to use only the
16165 most common ngrams, advanced processors have
lifted this computational constraint, and so we addition-
ally tested on all ngrams. We used a newer parameter es-
timation algorithm called Limited-Memory Variable Met-
ric (L-BFGS)[5] for maximum entropy classification. Pang
used the Improved Iterative Scaling method. We also imple-
mented and tested the effect of term frequency-inver docu-
ment frequency (TF-IDF) on classification results.
3. The User Review Domain
For our experiments, we worked with movie re-
views. Our data source was Pang’s released dataset
(http://www.cs.cornell.edu/people/pabo/movie-review-
data/) from their 2004 publication. The dataset contains
1000 positive reviews and 1000 negative reviews, each
labeled with their true sentiment. The original data source
was the Internet Movie Database (IMDb).
Pang applied the bag-of-words method to positive and
negative sentiment classification, but the same method can
be extended to various other domains, including topic clas-
sification. We additionally chose to work with a set of 5000
Yelp reviews, 1000 for each of their five star rating. Yelp
is a popular online urban city guide that houses reviews
of restaurants, shopping areas, and businesses. Although
a movie review and a Yelp review will differ in specialized
vocabulary, audience, tone, etc., the ways that people con-
vey sentiment (e.g. I loved it!) may not differ entirely. We
wished to explore how training classifiers in one domain
might generalize to neighbor domains.
The domain of reviews is experimentally convenient be-
cause there are largely available on-line and because re-
viewers often summarize their overall sentiment with a
machine-extractable rating indicator; hence, there was no
need for hand-labeling of data.