Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
With the development of websites and social networks, Internet users generate a massive amount of comments and information on the Web. Sentiment analysis, also called opinion mining, offers an opportunity to mine the people’s sentiments and emotions from the textual comments. In the last decade, sentiment analysis has been applied in research areas such as recommendation and support systems and has become an area of interest for many researchers. Therefore, many studies have been carried out on English, while other languages, such as Arabic, received less attention. Increasingly, sentiment analysis researchers use machine learning due to its excellent performance. However, the generated models are black boxes and non-interpretable by the users. The rule-based classification is a promising approach for generating interpretable models. This work proposes a classification rule-based Arabic sentiment analysis approach together with a new binary equilibrium optimization metaheuristic algorithm as an optimization method for classification rule generation from Arabic documents. The proposed approach has been experimented on the Opinion Corpus for Arabic (OCA) and generates a classification model of thirteen rules. The comparison results with state-of-the-art methods show that the proposed approach outperforms all other white-box models regarding classification accuracy.
With the development of websites and social networks, Internet users generate a massive amount of comments and information on the Web. Sentiment analysis, also called opinion mining, offers an opportunity to mine the people’s sentiments and emotions from the textual comments. In the last decade, sentiment analysis has been applied in research areas such as recommendation and support systems and has become an area of interest for many researchers. Therefore, many studies have been carried out on English, while other languages, such as Arabic, received less attention. Increasingly, sentiment analysis researchers use machine learning due to its excellent performance. However, the generated models are black boxes and non-interpretable by the users. The rule-based classification is a promising approach for generating interpretable models. This work proposes a classification rule-based Arabic sentiment analysis approach together with a new binary equilibrium optimization metaheuristic algorithm as an optimization method for classification rule generation from Arabic documents. The proposed approach has been experimented on the Opinion Corpus for Arabic (OCA) and generates a classification model of thirteen rules. The comparison results with state-of-the-art methods show that the proposed approach outperforms all other white-box models regarding classification accuracy.
The use of neural machine algorithms for English translation is a hot topic in the current research. English translation using the traditional sequential neural framework, which is too poor at capturing long-distance information, has its own major limitations. However, the current improved frameworks, such as recurrent neural network translation, are not satisfactory either. In this paper, we establish an attention coding and decoding model to address the shortcomings of traditional machine translation algorithms, combine the attention mechanism with a neural network framework, and implement the whole English translation system based on TensorFlow, thus improving the translation accuracy. The experimental test results show that the BLUE values of the algorithm model built in this paper are improved to different degrees compared with the traditional machine learning algorithms, which proves that the performance of the proposed algorithm model is significantly improved compared with the traditional model.
Writing style change detection models focus on determining the number of authors of documents with or without known authors. Determining the exact number of authors contributing in writing a document particularly when the authors contribute short texts in form of a sentence is still challenging because of the lack of standardized feature sets able to discriminate between the works of authors. Therefore, the task of identifying the best feature set for all the tasks of the writing style change detection is still considered important. This paper sought to determine the best feature set for the writing style change detection tasks; separating documents with several style changes (multi-authorship) from documents without any style changes (single-authorship), and determining the number and location of style changes in the case of multi-authorship. We performed exploratory research on existing stylometric features to determine the best document level and sentence level features. Document level features were extracted and used to separate single authored from multi-authored documents, while sentence level features were used to answer the question of determining the number of style changes To answer this question, we trained a random forest classifier to rank document level features and sentence level features separately, and applied an ablation test on the top 15 sentence level features using k-means clustering algorithm to confirm the effect of these features on model performance. The study found out that the best document level feature set for separating documents with and without style change was provided by an ensemble of features including number of sentence repetitions (num_sentence_repetitions) as the most determinant feature, 5-grams, 4-grams, Special_character, sentence_begin_lower, sentence_begin_upper, diversity, automated_readability_index, parenthesis_count, first_word_uppercase, lensear_write_formula, dale_chall_readability, difficult_words, type_token_ratio. These were the top ranked features in experiment one. On the other hand, the top fifteen sentence level features based on feature ranks using random forest classifier were diversity, dale_chall_readability grade, check_available_vowel, flesch_kincaid grade, parenthesis_count, colon_count, verbs, bigrams, alphabets, personal pronouns, coordinating conjunctions, interjections, modals, type_token ratio and punctuations_count. Consequently, the optimal feature set for determining the number of style changes in documents was considered based on the results of the ablation study on the top fifteen sentence level features, and was provided by an ensemble of features including personal pronouns, check_available_vowels, punctuations_counts, parenthesis count, coordinating conjunctions and colon count.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.