Abstract:Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approaches like sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. In this paper, we do research to solve WSD problem with two small corpora. We propose the use of Word2vec and Wikipedia to develop the corpora. After developing the corpora, … Show more
“…Paraphrase lexicon is also needed to detect the phrase in the document. There are several previous studies that applied word sense disambiguation and were proven to improve results and more understanding context [41,42].…”
Document searching using queries that can understand the context can affect the intent and purpose of the user's desire when searching documents. Many studies have been conducted on understanding the context of the query, but differences in terms of language can lead to different methods of context understanding; therefore, methods implemented in the previous studies need to be improved. In this paper, we proposed a query expansion method based on BabelNet search and Word Embedding (BabelNet Embedding). Query expansion method focuses on developing queries based on semantic relationships on queries to understand the context of the query. Candidate queries were developed by finding synonyms, measuring similarity using WordNet, Word Embedding on all articles of Wikipedia, and BableNet Embedding on articles Wikipedia Online. We compared our proposed method with the existing semantic query expansion. Our result provided better result in retrieving relevant document with accuracy of 89% in searching Arabic documents.
“…Paraphrase lexicon is also needed to detect the phrase in the document. There are several previous studies that applied word sense disambiguation and were proven to improve results and more understanding context [41,42].…”
Document searching using queries that can understand the context can affect the intent and purpose of the user's desire when searching documents. Many studies have been conducted on understanding the context of the query, but differences in terms of language can lead to different methods of context understanding; therefore, methods implemented in the previous studies need to be improved. In this paper, we proposed a query expansion method based on BabelNet search and Word Embedding (BabelNet Embedding). Query expansion method focuses on developing queries based on semantic relationships on queries to understand the context of the query. Candidate queries were developed by finding synonyms, measuring similarity using WordNet, Word Embedding on all articles of Wikipedia, and BableNet Embedding on articles Wikipedia Online. We compared our proposed method with the existing semantic query expansion. Our result provided better result in retrieving relevant document with accuracy of 89% in searching Arabic documents.
“…The above technique is also used for the similarity of a document [8]. This technique is also used to extract Twitter data features for crisis event classification cases [9].…”
Researchers have collected Twitter data to study a wide range of topics, one of which is a natural disaster. A social network sensor was developed in existing research to filter natural disaster information from direct eyewitnesses, none eyewitnesses, and non-natural disaster information. It can be used as a tool for early warning or monitoring when natural disasters occur. The main component of the social network sensor is the text tweet classification. Similar to text classification research in general, the challenge is the feature extraction method to convert Twitter text into structured data. The strategy commonly used is vector space representation. However, it has the potential to produce high dimension data. This research focuses on the feature extraction method to resolve high dimension data issues. We propose a hybrid approach of word2vec-based and lexicon-based feature extraction to produce new features. The Experiment result shows that the proposed method has fewer features and improves classification performance with an average AUC value of 0.84, and the number of features is 150. The value is obtained by using only the word2vec-based method. In the end, this research shows that lexicon-based did not influence the improvement in the performance of social network sensor predictions in natural disasters.
HIGHLIGHTS
Implementation of text classification is generally only used to perform sentiment analysis, it is still rare to use it to perform text classification for use in determining direct eyewitnesses in cases of natural disasters
One of the common problems in text mining research is the extracted features from the vector space representation method generate high dimension data
A hybrid approach of word2vec-based and lexicon-based feature extraction experiment was conducted in order to find a method that can generate new features with low dimensions and also improve the classification performance
GRAPHICAL ABSTRACT
“…We compared our opinion words polarity from HEOLS with the opinion word polarity from: 1) Opinion Lexicon; 2) the first sense of adjective word SentiWordNet [30] (positive if the SentiWordNet score > 0 and vice versa), we use SentiWordNet because it was used in previous research [31,32]; and 3) same as in point 2 but we add Word Sense Disambiguation (WSD) using Adapted Lesk [33] to improve the performance [34][35][36].…”
Many restaurant review analysis have been done, however only few analysis have been done for specific aspects of a restaurant. In this context this paper proposes aspect based restaurant analysis which includes Physical environment, Food quality, Service quality and Price fairness. The analysis steps include Aspect Term Extraction (ATE), Aspect Keyword Extraction (AKE), Aspect Categorization (AC) and Sentiment Analysis (SA). ATE employs the modification of Double Propagation method and several Topic Modelling methods, AKE utilizes Term Frequency-Inverse Cluster Frequency (TF-ICF), in AC we propose Hybrid ELMo-Wikipedia (HEW), and in SA we propose Hybrid Expanded Opinion Lexicon-SentiCircle (HEOLS). The results show that our modification of the methods used in ATE could increase the f1measure of the AC by average 2%, then the HEW that we proposed had better f1measure compared to other similar methods by average 6%. Other than that, our proposed HEOLS can expand and redetermine the Opinion Lexicon polarity and can increase f1measure of SA by 6%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.