2022
DOI: 10.12913/22998624/152453
|View full text |Cite
|
Sign up to set email alerts
|

Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish

Abstract: This study aims to evaluate experimentally the word vectors produced by three widely used embedding methods for the word-level semantic text similarity in Turkish. Three benchmark datasets SimTurk, AnlamVer, and RG65_Turkce are used in this study to evaluate the word embedding vectors produced by three different methods namely Word2Vec, Glove, and FastText. As a result of the comparative analysis, Turkish word vectors produced with Glove and FastText gained better correlation in the word level semantic similar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 14 publications
0
3
0
Order By: Relevance
“…Several studies have been conducted, spanning diverse linguistic contexts and applications, to assess the efficacy of various word embedding models. Investigations have ranged from comparing pre-trained word embedding vectors for word-level semantic text similarity in Turkish [26] to evaluating Neural Machine Translation (NMT) for languages such as English and Hindi [27]. Additionally, an exploration of the accuracy of three prominent word embedding models within the context of Convolutional Neural Network (CNN) text classification [28] has been undertaken.…”
Section: Word Embeddingmentioning
confidence: 99%
“…Several studies have been conducted, spanning diverse linguistic contexts and applications, to assess the efficacy of various word embedding models. Investigations have ranged from comparing pre-trained word embedding vectors for word-level semantic text similarity in Turkish [26] to evaluating Neural Machine Translation (NMT) for languages such as English and Hindi [27]. Additionally, an exploration of the accuracy of three prominent word embedding models within the context of Convolutional Neural Network (CNN) text classification [28] has been undertaken.…”
Section: Word Embeddingmentioning
confidence: 99%
“…In the BoW approach, the keywords are compared simply on the basis of their occurrences and not on their actual meanings. On the other hand, the use of semantically meaningful word embeddings such as GloVe [6] and Word2Vec [7] facilitates semantic similarity matching between documents for text classification [8,9]. Recurrent neural networks such as the Long Short-Term Memory (LSTM) [10] or transformers [11] are typically used to extract useful information from the sequence of word embeddings emanating from each document [12].…”
Section: Introductionmentioning
confidence: 99%
“…Machine learning is frequently used as a tool for NLP tasks, therefore there is some overlap between machine learning and NLP. Statistical models, machine learning, deep learning, and computational linguistic rule-based modeling of human language are all combined in NLP (Tulu, 2022).…”
Section: Introductionmentioning
confidence: 99%