This paper reports a study in automatic sentiment classification, i.e., automatically classifying documents as expressing positive or negative sentiments. The study investigates the effectiveness of using a machine-learning algorithm, support vector machine (SVM), on various text features to classify on-line product reviews into recommended (positive sentiment) and not recommended (negative sentiment). In the first part of this study, several approaches, unigrams (individual words), selected words (such as verb, adjective, and adverb), and words labeled with part-of-speech tags were investigated. Using SVM, the unigram approach obtained an accuracy rate of around 76%. Error analysis suggests various approaches for improving classification accuracy: handling of negation phrases, inferencing from superficial words, and handling the problem of comments on parts of the product. The second part of the study investigated the use of negation phrase n-grams to improve classification accuracy. This approach increased the accuracy rate to 79.33%. Compared with traditional subject classification which mainly uses unigrams, syntactic and semantic processing of text appear more important for sentiment classification. We expect that deeper linguistic processing will help increase accuracy for sentiment classification. D
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.