Nowadays, sentiment analysis methods become more and more popular especially with the proliferation of social media platform users number. In the same context, this paper presents a sentiment analysis approach which can faithfully translate the sentimental orientation of Arabic Twitter posts, based on a novel data representation and machine learning techniques. The proposed approach applied a wide range of features: lexical, surface-form, syntactic, etc. We also made use of lexicon features inferred from two Arabic sentiment words lexicons. To build our supervised sentiment analysis system, we use several standard classification methods (Support Vector Machines, K-Nearest Neighbour, Naïve Bayes, Decision Trees, Random Forest) known by their effectiveness over such classification issues.In our study, Support Vector Machines classifier outperforms other supervised algorithms in Arabic Twitter sentiment analysis. Via an ablation experiments, we show the positive impact of lexicon based features on providing higher prediction performance.CCS Concepts • Computing methodologies➝Artificial intelligence➝Natural language processing➝Language resources • Computing methodologies➝Machine learning approaches.
In this paper, we propose TunDiaWN (Tunisian dialect Wordnet) a lexical resource for the dialect language spoken in Tunisia. Our TunDiaWN construction approach is founded, in one hand, on a corpus based method to analyze and extract Tunisian dialect words. A clustering technique is adapted and applied to mine the possible relations existing between the Tunisian dialect extracted words and to group them into meaningful groups. All these suggestions are then evaluated and validated by the experts to perform the resource enrichment task. We reuse other Wordnet versions, mainly for English and Arabic language to propose a new database structure enriched by innovative features and entities.
In this paper, we present an approach to automatically extract and classify opinions in texts. We propose a similarity measurement calculating semantically distances between a word and predefined subgroups of seed words. We have evaluated our algorithm on the semantic evaluation company "SemEval 2007" corpus, and we obtained the best value of Precision and F1 62% and 61%. As an improvement of 20 % compared to others participants.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.