2018
DOI: 10.1109/taslp.2018.2837384
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Structure and Interpretability of Word Embeddings

Abstract: Dense word embeddings, which encode meanings of words to low-dimensional vector spaces, have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensio… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
27
0
3

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 74 publications
(36 citation statements)
references
References 23 publications
0
27
0
3
Order By: Relevance
“…16 In this kind of work, a word embedding model may be deemed more interpretable if humans are better able to identify the intruding words. Since the evaluation is costly for high-dimensional representations, alternative automatic metrics were considered (Park et al, 2017;Senel et al, 2018).…”
Section: Other Methodsmentioning
confidence: 99%
“…16 In this kind of work, a word embedding model may be deemed more interpretable if humans are better able to identify the intruding words. Since the evaluation is costly for high-dimensional representations, alternative automatic metrics were considered (Park et al, 2017;Senel et al, 2018).…”
Section: Other Methodsmentioning
confidence: 99%
“…If a base has no concept assigned to it, the average precision and the reciprocal rank of that base is set to zero. As for recall oriented metrics, similarly to (Senel et al 2017), train and test words are randomly selected (60%, 40%) for each concept before the assignment takes place. On average each concept has 40 test words.…”
Section: Discussionmentioning
confidence: 99%
“…They introduce a new dataset (SEMCAT) of 6,500 words described with 110 categories as the knowledge base. (Senel et al 2017) considers dense word embeddings. In contrast, our paper investigates sparse word embeddings from multiple aspects, and it is based on ConceptNet, which is much larger and richer but also noisier than SEMCAT.…”
Section: Related Workmentioning
confidence: 99%
“…Word embedding mengenali distribusi makna kata yang serupa yang kemudian dikenali pada sebuah model vector (Şenel, Utlu, Yücesoy, Koç, & Çukur, 2018). Dengan menangkap karakteristik kata-kata, baik itu kata aslinya maupun kata yang mirip, perlu dihitung kemiripan kata yang satu dengan kata yang lain.…”
Section: Word Embeddingunclassified
“…Oleh karena itu, diperlukan metode yang dapat mengatasi masalah tersebut dengan menggunakan word embedding yang dapat menangkap informasi semantik dan sintaksis kata-kata dari korpus besar yang tidak berlabel. Dengan menggunakan metode ini, sistem dapat memproses bahasa alami atau Natural Language Processing (NLP) (Dalpiaz, Ferrari, Franch, & Palomares, 2018) dengan mengambil informasi dari bahasa tersebut dan mengetahui hubungan makna antara suatu kata (Şenel, Utlu, Yücesoy, Koç, & Çukur, 2018). Informasi dari kata-kata tersebut direpresentasikan ke dalam masing-masing vektor.…”
unclassified