Abstract. This paper is a comparative study about text feature extraction methods in statistical learning of sentiment classification. Feature extraction is one of the most important steps in classification systems. We use stylometry to compare with TF-IDF and Delta TF-IDF baseline methods in sentiment classification. Stylometry is a research area of Linguistics that uses statistical techniques to analyze literary style. In order to assess the viability of the stylometry, we create a corpus of product reviews from the most traditional online service in Portuguese, namely, Buscapé. We gathered 2000 review about Smartphones. We use three classifiers, Support Vector Machine (SVM), Naive Bayes, and J48 to evaluate whether the stylometry has higher accuracy than the TF-IDF and Delta TF-IDF methods in sentiment classification. We found the better result with the SVM classifier (82,75%) of accuracy with stylometry and (72,62%) with Delta TF-IDF and (56,25%) with TF-IDF. The results show that stylometry is quite feasible method for sentiment classification, outperforming the accuracy of the baseline methods. We may emphasize that approach used has promising results.
Toxic comments contain forms of nonacceptable language targeted towards groups or individuals. These types of comments become a serious concern for government organizations, online communities, and social media platforms. Although there are some approaches to handle non-acceptable language, most of them focus on supervised learning and the English language. In this paper, we deal with toxic comment detection as a semi-supervised strategy over a heterogeneous graph. We evaluate the approach on a toxic dataset of the Portuguese language, outperforming several graph-based methods and achieving competitive results compared to transformer architectures.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.