Author identification for Under-Resourced language (KadazanDusun)

Tarmizi, Nursyahirah; Saee, Suhaila; Ibrahim, Dayang Hanani Abang

doi:10.11591/ijeecs.v17.i1.pp248-255

Cited by 5 publications

(4 citation statements)

References 18 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another interesting tweet-based author identification using n-grams and word n-grams that treats the single-labelled multi-class problem in order to identify author. This study (Tarmizi 2020) demonstrates outstanding deployment of SVM over Naïve Bayes classifier.…”

Section: Statistical and Linguistic Modelsmentioning

confidence: 76%

State of the Art in Authorship Attribution With Impact Analysis of Stylometric Features on Style Breach Prediction

Prasad

Chakkaravarthy

2022

Journal of Cases on Information Technology

View full text Add to dashboard Cite

The most influential research was studied that spans over the domains from Authorship attribution and stylometry. The reference material contributes robust classifiers with reasonable array of feature extraction techniques, such as Dirichlet–multinomial change point regression to extract the progress of inscription elegance with time, comprising plodding variations in stylishness as the author ages and unexpected vicissitudes. This paper presents quantifiable evaluation of the research in terms of year-wise research output, diversity of applications, nature of collaboration, characteristics of highly productive techniques and the benchmark of performance criteria by eminent high impact researchers. The outcomes of this study can by deployed for dialectology analysis and corpus linguistics, stylistics, natural language processing, classification, and literary and historical analysis, forensic analysis etc.

show abstract

Section: Statistical and Linguistic Modelsmentioning

confidence: 76%

State of the Art in Authorship Attribution With Impact Analysis of Stylometric Features on Style Breach Prediction

Prasad

Chakkaravarthy

2022

Journal of Cases on Information Technology

View full text Add to dashboard Cite

show abstract

“…Tarmizi et al [6] present the task of Author Identification for KadazanDusun language by using tweets as the source of data. The feature extraction used is a combination of n-grams which n is from 1 to 5.…”

Section: Of 13mentioning

confidence: 99%

Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness

et al. 2021

View full text Add to dashboard Cite

Researchers have collected Twitter data to study a wide range of topics, one of which is a natural disaster. A social network sensor was developed in existing research to filter natural disaster information from direct eyewitnesses, none eyewitnesses, and non-natural disaster information. It can be used as a tool for early warning or monitoring when natural disasters occur. The main component of the social network sensor is the text tweet classification. Similar to text classification research in general, the challenge is the feature extraction method to convert Twitter text into structured data. The strategy commonly used is vector space representation. However, it has the potential to produce high dimension data. This research focuses on the feature extraction method to resolve high dimension data issues. We propose a hybrid approach of word2vec-based and lexicon-based feature extraction to produce new features. The Experiment result shows that the proposed method has fewer features and improves classification performance with an average AUC value of 0.84, and the number of features is 150. The value is obtained by using only the word2vec-based method. In the end, this research shows that lexicon-based did not influence the improvement in the performance of social network sensor predictions in natural disasters. HIGHLIGHTS Implementation of text classification is generally only used to perform sentiment analysis, it is still rare to use it to perform text classification for use in determining direct eyewitnesses in cases of natural disasters One of the common problems in text mining research is the extracted features from the vector space representation method generate high dimension data A hybrid approach of word2vec-based and lexicon-based feature extraction experiment was conducted in order to find a method that can generate new features with low dimensions and also improve the classification performance GRAPHICAL ABSTRACT

show abstract

“…The calculations are continuing to prognosticating the correlation by calculating the covariance by implementing (2). Table 3 shows these calculation steps for x (number of friends), and y (friends and retweet) as two variables.…”

Section: ∑ ̅mentioning

confidence: 99%

“…Online social networks grew to become a significant phenomenon throughout the last years [1,2]. Where it has a vital impact on sharing, receiving and breaking news based on users' relationships [3].…”

Section: Introductionmentioning

confidence: 99%

Online social network relationships influenced on a retweeting

Abbood

Hasson

2020

IJEECS

View full text Add to dashboard Cite

<p>Social network users spending a lot of time to post, search, interact and read the news on blogging platforms. In this era, social media is becoming a suitable place for discovering and exchanging new updates. However, Common social media helps the user to share his news online by a one-click. The ease-of-use leads to present novel breaking news to show up first on micro blogs. Twitter is one of the well-known micro blogging platforms with more than 250 million users, in which retweeting is a manageable way to share and sawing news. It is significant to foretell the retweeting and influence in a social relationship. The Correlation Coefficient formula has been used to determine the level of correlation between a user and his retweeters (followers, friends, and strangers) in social networks. Such correlation can be reached by utilizing the collected user information on Twitter with some features which have a main effect on retweet behavior. In this study, the focus is on particular friends, followers, and a retweet to be the promising source of relationships between users of social media. Experimental results based on twitter dataset showed that the Correlation Coefficient formula can be used as a predicting model, and it is a general framework to gain better fulfillment in calculating the correlation between the user, friends, and followers in social networks.. Their influence on the accuracy in predicting a retweet is also accomplished.</p>

show abstract

Author identification for Under-Resourced language (KadazanDusun)

Cited by 5 publications

References 18 publications

State of the Art in Authorship Attribution With Impact Analysis of Stylometric Features on Style Breach Prediction

State of the Art in Authorship Attribution With Impact Analysis of Stylometric Features on Style Breach Prediction

Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness

Online social network relationships influenced on a retweeting

Contact Info

Product

Resources

About