Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Nurifan, Farza; Sarno, Riyanarto; Wahyuni, Cahyaningtyas Sekar

doi:10.11591/ijeecs.v12.i3.pp1239-1246

Cited by 8 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Paraphrase lexicon is also needed to detect the phrase in the document. There are several previous studies that applied word sense disambiguation and were proven to improve results and more understanding context [41,42].…”

Section: Resultsmentioning

confidence: 99%

Query Expansion Based on Wikipedia Word Embedding and BabelNet Method for Searching Arabic Documents

Maryamah¹,

Arifin²,

Sarno³

et al. 2019

IJIES

Self Cite

View full text Add to dashboard Cite

Document searching using queries that can understand the context can affect the intent and purpose of the user's desire when searching documents. Many studies have been conducted on understanding the context of the query, but differences in terms of language can lead to different methods of context understanding; therefore, methods implemented in the previous studies need to be improved. In this paper, we proposed a query expansion method based on BabelNet search and Word Embedding (BabelNet Embedding). Query expansion method focuses on developing queries based on semantic relationships on queries to understand the context of the query. Candidate queries were developed by finding synonyms, measuring similarity using WordNet, Word Embedding on all articles of Wikipedia, and BableNet Embedding on articles Wikipedia Online. We compared our proposed method with the existing semantic query expansion. Our result provided better result in retrieving relevant document with accuracy of 89% in searching Arabic documents.

show abstract

Section: Resultsmentioning

confidence: 99%

Query Expansion Based on Wikipedia Word Embedding and BabelNet Method for Searching Arabic Documents

Maryamah¹,

Arifin²,

Sarno³

et al. 2019

IJIES

Self Cite

View full text Add to dashboard Cite

show abstract

“…The above technique is also used for the similarity of a document [8]. This technique is also used to extract Twitter data features for crisis event classification cases [9].…”

Section: Of 13mentioning

confidence: 99%

Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness

et al. 2021

View full text Add to dashboard Cite

Researchers have collected Twitter data to study a wide range of topics, one of which is a natural disaster. A social network sensor was developed in existing research to filter natural disaster information from direct eyewitnesses, none eyewitnesses, and non-natural disaster information. It can be used as a tool for early warning or monitoring when natural disasters occur. The main component of the social network sensor is the text tweet classification. Similar to text classification research in general, the challenge is the feature extraction method to convert Twitter text into structured data. The strategy commonly used is vector space representation. However, it has the potential to produce high dimension data. This research focuses on the feature extraction method to resolve high dimension data issues. We propose a hybrid approach of word2vec-based and lexicon-based feature extraction to produce new features. The Experiment result shows that the proposed method has fewer features and improves classification performance with an average AUC value of 0.84, and the number of features is 150. The value is obtained by using only the word2vec-based method. In the end, this research shows that lexicon-based did not influence the improvement in the performance of social network sensor predictions in natural disasters. HIGHLIGHTS Implementation of text classification is generally only used to perform sentiment analysis, it is still rare to use it to perform text classification for use in determining direct eyewitnesses in cases of natural disasters One of the common problems in text mining research is the extracted features from the vector space representation method generate high dimension data A hybrid approach of word2vec-based and lexicon-based feature extraction experiment was conducted in order to find a method that can generate new features with low dimensions and also improve the classification performance GRAPHICAL ABSTRACT

show abstract

“…We compared our opinion words polarity from HEOLS with the opinion word polarity from: 1) Opinion Lexicon; 2) the first sense of adjective word SentiWordNet [30] (positive if the SentiWordNet score > 0 and vice versa), we use SentiWordNet because it was used in previous research [31,32]; and 3) same as in point 2 but we add Word Sense Disambiguation (WSD) using Adapted Lesk [33] to improve the performance [34][35][36].…”

Section: Comparisonmentioning

confidence: 99%

Aspect Based Sentiment Analysis for Restaurant Reviews Using Hybrid ELMoWikipedia and Hybrid Expanded Opinion Lexicon-SentiCircle

Nurifan¹,

Sarno²,

Sungkono³

2019

IJIES

Self Cite

View full text Add to dashboard Cite

Many restaurant review analysis have been done, however only few analysis have been done for specific aspects of a restaurant. In this context this paper proposes aspect based restaurant analysis which includes Physical environment, Food quality, Service quality and Price fairness. The analysis steps include Aspect Term Extraction (ATE), Aspect Keyword Extraction (AKE), Aspect Categorization (AC) and Sentiment Analysis (SA). ATE employs the modification of Double Propagation method and several Topic Modelling methods, AKE utilizes Term Frequency-Inverse Cluster Frequency (TF-ICF), in AC we propose Hybrid ELMo-Wikipedia (HEW), and in SA we propose Hybrid Expanded Opinion Lexicon-SentiCircle (HEOLS). The results show that our modification of the methods used in ATE could increase the f1measure of the AC by average 2%, then the HEW that we proposed had better f1measure compared to other similar methods by average 6%. Other than that, our proposed HEOLS can expand and redetermine the Opinion Lexicon polarity and can increase f1measure of SA by 6%.

show abstract

Developing Corpora using Wikipedia and Word2vec for Word Sense Disambiguation

Cited by 8 publications

References 7 publications

Query Expansion Based on Wikipedia Word Embedding and BabelNet Method for Searching Arabic Documents

Query Expansion Based on Wikipedia Word Embedding and BabelNet Method for Searching Arabic Documents

Natural Disaster on Twitter: Role of Feature Extraction Method of Word2Vec and Lexicon Based for Determining Direct Eyewitness

Aspect Based Sentiment Analysis for Restaurant Reviews Using Hybrid ELMoWikipedia and Hybrid Expanded Opinion Lexicon-SentiCircle

Contact Info

Product

Resources

About