There are situations in which lexicon-based methods for Sentiment Analysis (SA) are not able to generate a classification output for specific instances of a given dataset. Most often, the reason for this situation is the absence of specific terms in the sentiment lexicon required in the classification effort. In such cases, there were only two possible paths to follow: (1) add terms to the lexicon (off-line process) by human intervention to guarantee no noise is introduced into the lexicon, which prevents the classification system to provide an immediate answer; or (2) use the services of a word-frequency dictionary (on-line process), which is computationally costly to build. This paper investigates an alternative approach to compensate for the lack of ability of a lexicon-based method to produce a classification output. The method is based on the combination of the classification outputs of non lexicon-based tools. Specifically, firstly the outcome values of applying two or more non-lexicon classification methods are obtained. Secondly, these non-lexicon outcomes are fused using a uninorm based approach, which has been proved to have desirable compensation properties as required in the SA context, to generate the classification output the lexicon based approach is unable to achieve. Experimental results based on the execution of two well-known supervised machine learning algorithms, namely Naïve Bayes and Maximum Entropy, and the application of a cross-ratio uninorm operator are presented. Performance indices associated to options (1) and (2) above are compared against the results obtained using the proposed approach for two different datasets. Additionally, the performance of the proposed cross-ratio uninorm operator based approach is also compared when the aggregation operator used is the arithmetic mean instead. It is shown that the combination of non lexicon-based classification methods with specific uninorm operators improves the classification performance of lexicon-based methods, and it enables the offering of an alternative solution to the SA classification problem when needed. The proposed aggregation method could be used as well as a replacement of ensemble averaging techniques commonly applied when combining the results of several machine learning classifiers' outputs.
This article covers some success and learning experiences attained during the developing of a hybrid approach to Sentiment Analysis (SA) based on a Sentiment Lexicon, Semantic Rules, Negation Handling, Ambiguity Management and Linguistic Variables. The proposed hybrid method is presented and applied to two selected datasets: Movie Review and Sentiment Twitter datasets. The achieved results are compared against those obtained when Naïve Bayes (NB) and Maximum Entropy (ME) supervised machine learning classification methods are used for the same datasets. The proposed hybrid system attained higher accuracy and precision scores than NB and ME, which shows its superiority when applied to the SA problem at the sentence level. Finally, an alternative strategy to calculating the orientation polarity and polarity intensity in one step instead of the two steps method used in the hybrid approach is explored. The analysis of the yielded mixed results achieved with this alternative approach shows its potential as an aid in the computation of semantic orientations and produced some lessons learnt in developing a more effective mechanism to calculating the orientation polarity and polarity intensity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.