This article covers some success and learning experiences attained during the developing of a hybrid approach to Sentiment Analysis (SA) based on a Sentiment Lexicon, Semantic Rules, Negation Handling, Ambiguity Management and Linguistic Variables. The proposed hybrid method is presented and applied to two selected datasets: Movie Review and Sentiment Twitter datasets. The achieved results are compared against those obtained when Na茂ve Bayes (NB) and Maximum Entropy (ME) supervised machine learning classification methods are used for the same datasets. The proposed hybrid system attained higher accuracy and precision scores than NB and ME, which shows its superiority when applied to the SA problem at the sentence level. Finally, an alternative strategy to calculating the orientation polarity and polarity intensity in one step instead of the two steps method used in the hybrid approach is explored. The analysis of the yielded mixed results achieved with this alternative approach shows its potential as an aid in the computation of semantic orientations and produced some lessons learnt in developing a more effective mechanism to calculating the orientation polarity and polarity intensity.
There are situations in which lexicon-based methods for Sentiment Analysis (SA) are not able to generate a classification output for specific instances of a given dataset. Most often, the reason for this situation is the absence of specific terms in the sentiment lexicon required in the classification effort. In such cases, there were only two possible paths to follow: (1) add terms to the lexicon (off-line process) by human intervention to guarantee no noise is introduced into the lexicon, which prevents the classification system to provide an immediate answer; or (2) use the services of a word-frequency dictionary (on-line process), which is computationally costly to build. This paper investigates an alternative approach to compensate for the lack of ability of a lexicon-based method to produce a classification output. The method is based on the combination of the classification outputs of non lexicon-based tools. Specifically, firstly the outcome values of applying two or more non-lexicon classification methods are obtained. Secondly, these non-lexicon outcomes are fused using a uninorm based approach, which has been proved to have desirable compensation properties as required in the SA context, to generate the classification output the lexicon based approach is unable to achieve. Experimental results based on the execution of two well-known supervised machine learning algorithms, namely Na茂ve Bayes and Maximum Entropy, and the application of a cross-ratio uninorm operator are presented. Performance indices associated to options (1) and (2) above are compared against the results obtained using the proposed approach for two different datasets. Additionally, the performance of the proposed cross-ratio uninorm operator based approach is also compared when the aggregation operator used is the arithmetic mean instead. It is shown that the combination of non lexicon-based classification methods with specific uninorm operators improves the classification performance of lexicon-based methods, and it enables the offering of an alternative solution to the SA classification problem when needed. The proposed aggregation method could be used as well as a replacement of ensemble averaging techniques commonly applied when combining the results of several machine learning classifiers' outputs.
Abstract-This contribution presents a hybrid approach to Sentiment Analysis (SA) encompassing the use of semantic rules, fuzzy sets, unsupervised machine learning techniques and a sentiment lexicon improved with the support of SentiWordNet. A Hybrid Standard Classification is first carried out, which is further enhanced into a Hybrid Advanced approach incorporating linguistic classification of semantic polarity modelled using fuzzy sets. The mechanism of the new SA methodology is illustrated by applying it to compute the polarity of a given sentence and to a benchmarking publicly available dataset: the Movie Review Dataset.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright 漏 2024 scite LLC. All rights reserved.
Made with 馃挋 for researchers
Part of the Research Solutions Family.