Computational Linguistics and Intelligent Text Processing

Gelbukh, Alexander

doi:10.1007/978-3-319-18111-0

Lecture Notes in Computer Science

2015

DOI: 10.1007/978-3-319-18111-0

|View full text |Cite

Computational Linguistics and Intelligent Text Processing

Alexander Gelbukh¹

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

2022

Publication Types

Select...

Article4

Relationship

Self Cite0

Independent4

Authors

Journals

Cited by 4 publications

References 42 publications

(68 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

Rule-Based Arabic Sentiment Analysis using Binary Equilibrium Optimization Algorithm

Rahab

Haouassi

Laouid³

2022

Arab J Sci Eng

View full text Add to dashboard Cite

With the development of websites and social networks, Internet users generate a massive amount of comments and information on the Web. Sentiment analysis, also called opinion mining, offers an opportunity to mine the people’s sentiments and emotions from the textual comments. In the last decade, sentiment analysis has been applied in research areas such as recommendation and support systems and has become an area of interest for many researchers. Therefore, many studies have been carried out on English, while other languages, such as Arabic, received less attention. Increasingly, sentiment analysis researchers use machine learning due to its excellent performance. However, the generated models are black boxes and non-interpretable by the users. The rule-based classification is a promising approach for generating interpretable models. This work proposes a classification rule-based Arabic sentiment analysis approach together with a new binary equilibrium optimization metaheuristic algorithm as an optimization method for classification rule generation from Arabic documents. The proposed approach has been experimented on the Opinion Corpus for Arabic (OCA) and generates a classification model of thirteen rules. The comparison results with state-of-the-art methods show that the proposed approach outperforms all other white-box models regarding classification accuracy.

show abstract

Rule-Based Arabic Sentiment Analysis using Binary Equilibrium Optimization Algorithm

Rahab

Haouassi

Laouid³

2022

Arab J Sci Eng

View full text Add to dashboard Cite

show abstract

Research on Intelligent English Translation Method Based on the Improved Attention Mechanism Model

Wang

2021

Scientific Programming

View full text Add to dashboard Cite

The use of neural machine algorithms for English translation is a hot topic in the current research. English translation using the traditional sequential neural framework, which is too poor at capturing long-distance information, has its own major limitations. However, the current improved frameworks, such as recurrent neural network translation, are not satisfactory either. In this paper, we establish an attention coding and decoding model to address the shortcomings of traditional machine translation algorithms, combine the attention mechanism with a neural network framework, and implement the whole English translation system based on TensorFlow, thus improving the translation accuracy. The experimental test results show that the BLUE values of the algorithm model built in this paper are improved to different degrees compared with the traditional machine learning algorithms, which proves that the performance of the proposed algorithm model is significantly improved compared with the traditional model.

show abstract

An Optimal Feature Set for Stylometry-based Style Change detection at Document and Sentence Level

2022

View full text Add to dashboard Cite

Writing style change detection models focus on determining the number of authors of documents with or without known authors. Determining the exact number of authors contributing in writing a document particularly when the authors contribute short texts in form of a sentence is still challenging because of the lack of standardized feature sets able to discriminate between the works of authors. Therefore, the task of identifying the best feature set for all the tasks of the writing style change detection is still considered important. This paper sought to determine the best feature set for the writing style change detection tasks; separating documents with several style changes (multi-authorship) from documents without any style changes (single-authorship), and determining the number and location of style changes in the case of multi-authorship. We performed exploratory research on existing stylometric features to determine the best document level and sentence level features. Document level features were extracted and used to separate single authored from multi-authored documents, while sentence level features were used to answer the question of determining the number of style changes To answer this question, we trained a random forest classifier to rank document level features and sentence level features separately, and applied an ablation test on the top 15 sentence level features using k-means clustering algorithm to confirm the effect of these features on model performance. The study found out that the best document level feature set for separating documents with and without style change was provided by an ensemble of features including number of sentence repetitions (num_sentence_repetitions) as the most determinant feature, 5-grams, 4-grams, Special_character, sentence_begin_lower, sentence_begin_upper, diversity, automated_readability_index, parenthesis_count, first_word_uppercase, lensear_write_formula, dale_chall_readability, difficult_words, type_token_ratio. These were the top ranked features in experiment one. On the other hand, the top fifteen sentence level features based on feature ranks using random forest classifier were diversity, dale_chall_readability grade, check_available_vowel, flesch_kincaid grade, parenthesis_count, colon_count, verbs, bigrams, alphabets, personal pronouns, coordinating conjunctions, interjections, modals, type_token ratio and punctuations_count. Consequently, the optimal feature set for determining the number of style changes in documents was considered based on the results of the ablation study on the top fifteen sentence level features, and was provided by an ensemble of features including personal pronouns, check_available_vowels, punctuations_counts, parenthesis count, coordinating conjunctions and colon count.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Computational Linguistics and Intelligent Text Processing

Cited by 4 publications

References 42 publications

Rule-Based Arabic Sentiment Analysis using Binary Equilibrium Optimization Algorithm

Rule-Based Arabic Sentiment Analysis using Binary Equilibrium Optimization Algorithm

Research on Intelligent English Translation Method Based on the Improved Attention Mechanism Model

An Optimal Feature Set for Stylometry-based Style Change detection at Document and Sentence Level

Contact Info

Product

Resources

About