Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

Tulu, Cagatay Neftali

doi:10.12913/22998624/152453

Cited by 6 publications

(3 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several studies have been conducted, spanning diverse linguistic contexts and applications, to assess the efficacy of various word embedding models. Investigations have ranged from comparing pre-trained word embedding vectors for word-level semantic text similarity in Turkish [26] to evaluating Neural Machine Translation (NMT) for languages such as English and Hindi [27]. Additionally, an exploration of the accuracy of three prominent word embedding models within the context of Convolutional Neural Network (CNN) text classification [28] has been undertaken.…”

Section: Word Embeddingmentioning

confidence: 99%

An Improved Evaluation Metrics for Sentence Suggestions in Nursing and Elderly Care Record Applications

Hamdhana,

Kaneko,

Victorino

et al. 2023

Preprint

View full text Add to dashboard Cite

In this paper, we propose a novel approach named EmbedHDP to enhance the evaluation models used to assess sentence suggestions within nursing care record applications. The key focus is determining whether these suggestions garner assessments that align with caregivers as human evaluators. It is crucial due to the direct relevance of the information provided to the health or condition of the elderly. The motivation behind this proposal stems from challenges observed in previous models, such as BERTScore, which encountered difficulties in effectively evaluating the nurse care record domain, consistently providing quality assessments of generated sentence suggestions above 60%. Additionally, while widely used, cosine similarity exhibits limitations concerning word order, leading to potential misjudgments of semantical differences within similar word sets. Similarly, relying on lexical overlap, ROUGE tends to overlook semantic accuracy. Furthermore, despite its utility, BLEU neglects semantic coherence in its evaluations. EmbedHDP excels in evaluating nurse care records by effectively handling a variety of sentence structures and medical terminology and providing differentiated and contextually relevant assessments. We used a dataset comprising 320 pairs of sentences with correspondingly equivalent lengths. The results revealed that EmbedHDP outperformed other evaluation models, achieving a coefficient score of 61%, followed by cosine similarity with a score of 59%, and BERTScore with 58%. This shows the effectiveness of our proposed approach in improving the evaluation of sentence suggestions in nursing care record applications.

show abstract

Section: Word Embeddingmentioning

confidence: 99%

An Improved Evaluation Metrics for Sentence Suggestions in Nursing and Elderly Care Record Applications

Hamdhana,

Kaneko,

Victorino

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In the BoW approach, the keywords are compared simply on the basis of their occurrences and not on their actual meanings. On the other hand, the use of semantically meaningful word embeddings such as GloVe [6] and Word2Vec [7] facilitates semantic similarity matching between documents for text classification [8,9]. Recurrent neural networks such as the Long Short-Term Memory (LSTM) [10] or transformers [11] are typically used to extract useful information from the sequence of word embeddings emanating from each document [12].…”

Section: Introductionmentioning

confidence: 99%

Uniqueness meets Semantics: A Novel Semantically Meaningful Bag-of-Words Approach for Matching Resumes to Job Profiles

Susan,

Sharma,

Choudhary

2024

View full text Add to dashboard Cite

In an increasingly competitive world, automated screening of resumes of candidates is the need of the hour given the large numbers of such resumes in career portals on the World Wide Web. Resume classification is a subset of the document classification problem in which the keywords extracted from the resume play a significant role in determining the job profile. In this paper, we explore the novel combination of uniqueness in terms of the number of occurrences of a keyword in a resume class as compared to the other resume classes, and the concept of semantics by representing the filtered keywords using word embeddings that can be used to find semantic similarities between resume documents. The principle of maximum entropy partitioning is used to find the keywords unique to a particular class. The aim is to use semantic representations of only those keywords that occur more frequently in one class more than in any other class; these are then passed as input to a Bidirectional long short-term memory (LSTM) for classification. Our experiments on a benchmark dataset proves that the proposed approach outperforms the state of the art in text classification by a significant margin proving the efficacy of our approach.

show abstract

“…Machine learning is frequently used as a tool for NLP tasks, therefore there is some overlap between machine learning and NLP. Statistical models, machine learning, deep learning, and computational linguistic rule-based modeling of human language are all combined in NLP (Tulu, 2022).…”

Section: Introductionmentioning

confidence: 99%

LSTM and Bidirectional GRU Comparison for Text Classification

Asrawi,

Utami,

Yaqin

2023

SinkrOn

View full text Add to dashboard Cite

Although the phrases machine learning and AI are frequently used interchangeably and are frequently discussed together, they do not have the same meanings. While all artificial intelligence (AI) is machine learning, not all AI is machine learning, which is a key distinction. In the beginning, machine learning and natural language processing (NLP) are related since machine learning is frequently employed as a tool for NLP tasks. The advantage of NLP is that it can perform analysis, and examine a lot of data, including comments on social media accounts and hundreds of online customer evaluations. Text classification is essentially what needs to be done. This study compares Bidirectional GRU and LSTM as text classification algorithms using 20,000 newsgroup documents from 20 newsgroups from The UCI KDD Archive. After using the suggested model, we compare it to the long short-term memory and bidirectional GRU models for accuracy and validation. The results of the two comparisons show that the bidirectional GRU model performs better than the long short-term memory model. And this is a successful classification of text using a deep learning algorithm that uses a bidirectional GRU.

show abstract

Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

Cited by 6 publications

References 14 publications

An Improved Evaluation Metrics for Sentence Suggestions in Nursing and Elderly Care Record Applications

An Improved Evaluation Metrics for Sentence Suggestions in Nursing and Elderly Care Record Applications

Uniqueness meets Semantics: A Novel Semantically Meaningful Bag-of-Words Approach for Matching Resumes to Job Profiles

LSTM and Bidirectional GRU Comparison for Text Classification

Contact Info

Product

Resources

About