A Comparison of Authorship Attribution Approaches Applied on the Lithuanian Language

Kapočiūtė-Dzikienė, Jurgita; Venčkauskas, Algimantas; Damaševičius, Robertas

doi:10.15439/2017f110

Cited by 5 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With the recent increase in demand for various Natural Language Processing (NLP) technologies, such as chatbots [3], content classification [4], Sentiment Analysis [5][6][7], hate speech detection [8,9], authorship recognition and attribution [10], product and service recommenders [11,12], text summarization [13,14], email spam detection [15] and phishing detection [16], intent detection [17], and search optimization [18], ML models have presented a huge advantage and have created many opportunities for researchers in the field of text classification.…”

Section: Introductionmentioning

confidence: 99%

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

Palanivinayagam

El-Bayeh²,

Damaševičius

2023

Algorithms

Self Cite

View full text Add to dashboard Cite

Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification.

show abstract

Section: Introductionmentioning

confidence: 99%

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

Palanivinayagam

El-Bayeh²,

Damaševičius

2023

Algorithms

Self Cite

View full text Add to dashboard Cite

show abstract

“…TC is a machine learning challenge that tries to classify new written content into a conceptual group from a predetermined classification collection [1]. It is crucial in a variety of applications, including sentiment analysis [2,3], spam email filtering [4,5], hate speech detection [6], text summarization [7], website classification [8], authorship attribution [9], information retrieval [10], medical diagnostics [11], emotion detection on smart phones [12], online recommendations [13], fake news detection [14,15], crypto-ransomware early detection [16], semantic similarity detection [17], part-of-speech tagging [18], news classification [19], and tweet classification [20].…”

Section: Introductionmentioning

confidence: 99%

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

et al. 2022

View full text Add to dashboard Cite

We propose a novel text classification model, which aims to improve the performance of Arabic text classification using machine learning techniques. One of the effective solutions in Arabic text classification is to find the suitable feature selection method with an optimal number of features alongside the classifier. Although several text classification methods have been proposed for the Arabic language using different techniques, such as feature selection methods, an ensemble of classifiers, and discriminative features, choosing the optimal method becomes an NP-hard problem considering the huge search space. Therefore, we propose a method, called Optimal Configuration Determination for Arabic text Classification (OCATC), which utilized the Particle Swarm Optimization (PSO) algorithm to find the optimal solution (configuration) from this space. The proposed OCATC method extracts and converts the features from the textual documents into a numerical vector using the Term Frequency-Inverse Document Frequency (TF–IDF) approach. Finally, the PSO selects the best architecture from a set of classifiers to feature selection methods with an optimal number of features. Extensive experiments were carried out to evaluate the performance of the OCATC method using six datasets, including five publicly available datasets and our proposed dataset. The results obtained demonstrate the superiority of OCATC over individual classifiers and other state-of-the-art methods.

show abstract

“…In the whole area of authorship identification, authorship attribution is the most explored topic for the morphologically complex Lithuanian language (the recent research work is described in [15], [16]). Unfortunately, the deep learning methods have never been applied on the Lithuanian language in any of these tasks, including AP.…”

Section: Introduction and Related Workmentioning

confidence: 99%

Lithuanian Author Profiling with the Deep Learning

Kapociute-Dzikicne

Damaševičius

2018

Annals of Computer Science and Information Systems

Self Cite

View full text Add to dashboard Cite

We address the Lithuanian author profiling task in two dimensions (AGE and GENDER) using two deep learning methods (i.e., Long Short-Term Memory-LSTM) and Convolutional Neural Network-CNN) applied on the top of Lithuanian neural word embeddings. We also investigate an impact of the training dataset size on the author profiling accuracy. The best results are achieved with the largest datasets, containing 5,000 instances in each class. Besides, LSTM was more effective on the smaller datasets, and CNN-on the larger ones. We compare the deep learning methods with the traditional machine learning methods (in particular, Naive Bayes Multinomial and Support Vector Machine), and frequencies of elements as the feature representation). The comparison revealed that the deep learning is not the best solution for our author profiling task.

show abstract

A Comparison of Authorship Attribution Approaches Applied on the Lithuanian Language

Cited by 5 publications

References 19 publications

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

A Novel Text Classification Technique Using Improved Particle Swarm Optimization: A Case Study of Arabic Language

Lithuanian Author Profiling with the Deep Learning

Contact Info

Product

Resources

About