Semantic Structure and Interpretability of Word Embeddings

Senel, Lutfi Kerem; Utlu, Ihsan; Yücesoy, Veysel; Koç, Aykut; Çukur, Tolga

doi:10.1109/taslp.2018.2837384

Cited by 74 publications

(36 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…16 In this kind of work, a word embedding model may be deemed more interpretable if humans are better able to identify the intruding words. Since the evaluation is costly for high-dimensional representations, alternative automatic metrics were considered (Park et al, 2017;Senel et al, 2018).…”

Section: Other Methodsmentioning

confidence: 99%

Analysis Methods in Neural Language Processing: A Survey

Belinkov

Glass

2019

Transactions of the Association for Computational Linguistics

386

297

View full text Add to dashboard Cite

The field of natural language processing has seen impressive progress in recent years, with neural network models replacing many of the traditional systems. A plethora of new models have been proposed, many of which are thought to be opaque compared to their feature-rich counterparts. This has led researchers to analyze, interpret, and evaluate neural networks in novel and more finegrained ways. In this survey paper, we review analysis methods in neural language processing, categorize them according to prominent research trends, highlight existing limitations, and point to potential directions for future work.

show abstract

Section: Other Methodsmentioning

confidence: 99%

Analysis Methods in Neural Language Processing: A Survey

Belinkov

Glass

2019

Transactions of the Association for Computational Linguistics

386

297

View full text Add to dashboard Cite

show abstract

“…If a base has no concept assigned to it, the average precision and the reciprocal rank of that base is set to zero. As for recall oriented metrics, similarly to (Senel et al 2017), train and test words are randomly selected (60%, 40%) for each concept before the assignment takes place. On average each concept has 40 test words.…”

Section: Discussionmentioning

confidence: 99%

“…They introduce a new dataset (SEMCAT) of 6,500 words described with 110 categories as the knowledge base. (Senel et al 2017) considers dense word embeddings. In contrast, our paper investigates sparse word embeddings from multiple aspects, and it is based on ConceptNet, which is much larger and richer but also noisier than SEMCAT.…”

Section: Related Workmentioning

confidence: 99%

Understanding the Semantic Content of Sparse Word Embeddings Using a Commonsense Knowledge Base

Balogh

Berend

Diochnos

et al. 2020

AAAI

View full text Add to dashboard Cite

Word embeddings have developed into a major NLP tool with broad applicability. Understanding the semantic content of word embeddings remains an important challenge for additional applications. One aspect of this issue is to explore the interpretability of word embeddings. Sparse word embeddings have been proposed as models with improved interpretability. Continuing this line of research, we investigate the extent to which human interpretable semantic concepts emerge along the bases of sparse word representations. In order to have a broad framework for evaluation, we consider three general approaches for constructing sparse word representations, which are then evaluated in multiple ways. We propose a novel methodology to evaluate the semantic content of word embeddings using a commonsense knowledge base, applied here to the sparse case. This methodology is illustrated by two techniques using the ConceptNet knowledge base. The first approach assigns a commonsense concept label to the individual dimensions of the embedding space. The second approach uses a metric, derived by spreading activation, to quantify the coherence of coordinates along the individual axes. We also provide results on the relationship between the two approaches. The results show, for example, that in the individual dimensions of sparse word embeddings, words having high coefficients are more semantically related in terms of path lengths in the knowledge base than the ones having zero coefficients.

show abstract

“…Word embedding mengenali distribusi makna kata yang serupa yang kemudian dikenali pada sebuah model vector (Şenel, Utlu, Yücesoy, Koç, & Çukur, 2018). Dengan menangkap karakteristik kata-kata, baik itu kata aslinya maupun kata yang mirip, perlu dihitung kemiripan kata yang satu dengan kata yang lain.…”

Section: Word Embeddingunclassified

“…Oleh karena itu, diperlukan metode yang dapat mengatasi masalah tersebut dengan menggunakan word embedding yang dapat menangkap informasi semantik dan sintaksis kata-kata dari korpus besar yang tidak berlabel. Dengan menggunakan metode ini, sistem dapat memproses bahasa alami atau Natural Language Processing (NLP) (Dalpiaz, Ferrari, Franch, & Palomares, 2018) dengan mengambil informasi dari bahasa tersebut dan mengetahui hubungan makna antara suatu kata (Şenel, Utlu, Yücesoy, Koç, & Çukur, 2018). Informasi dari kata-kata tersebut direpresentasikan ke dalam masing-masing vektor.…”

unclassified

Query Expansion menggunakan Word Embedding dan Pseudo Relevance Feedback

Tanuwijaya

Adam

Anggris

et al. 2019

register. jurnal. ilm. teknologi. sistem. inf.

View full text Add to dashboard Cite

Kata kunci merupakan hal terpenting dalam mencari sebuah informasi. Penggunaan kata kunci yang tepat menghasilkan informasi yang relevan. Saat penggunaannya sebagai query, pengguna menggunakan bahasa yang alami, sehingga terdapat kata di luar dokumen jawaban yang telah disiapkan oleh sistem. Sistem tidak dapat memproses bahasa alami secara langsung yang dimasukkan oleh pengguna, sehingga diperlukan proses untuk mengolah kata-kata tersebut dengan mengekspansi setiap kata yang dimasukkan pengguna yang dikenal dengan Query Expansion (QE). Metode QE pada penelitian ini menggunakan Word Embedding karena hasil dari Word Embedding dapat memberikan kata-kata yang sering muncul bersama dengan kata-kata dalam query. Hasil dari word embedding dipakai sebagai masukan pada pseudo relevance feedback untuk diperkaya berdasarkan dokumen jawaban yang telah ada. Metode QE diterapkan dan diuji coba pada aplikasi chatbot. Hasil dari uji coba metode QE yang diterapkan pada chatbot didapatkan nilai recall, precision, dan F-measure masing-masing 100%; 70% dan 82,35 %. Hasil tersebut meningkat 1,49% daripada chatbot tanpa menggunakan QE yang pernah dilakukan sebelumnya yang hanya meraih akurasi sebesar 68,51%. Berdasarkan hasil pengukuran tersebut, QE menggunakan word embedding dan pseudo relevance feedback pada chatbot dapat mengatasi query masukan dari pengguna yang ambigu dan alami, sehingga dapat memberikan jawaban yang relevan kepada pengguna. Keywords are the most important words and phrases used to obtain relevant information on content. Although users make use of natural languages, keywords are processed as queries by the system due to its inability to process. The language directly entered by the user is known as query expansion (QE). The proposed QE in this research uses word embedding owing to its ability to provide words that often appear along with those in the query. The results are used as inputs to the pseudo relevance feedback to be enriched based on the existing documents. This method is also applied to the chatbot application and precision, and F-measure values of the results obtained were 100%, 70%, 82.35% respectively. The results are 1.49% better than chatbot without using QE with 68.51% accuracy. Based on the results of these measurements, QE using word embedding and pseudo which gave relevance feedback in chatbots can resolve ambiguous and natural user’s input queries thereby enabling the system retrieve relevant answers.

show abstract

Semantic Structure and Interpretability of Word Embeddings

Cited by 74 publications

References 23 publications

Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Understanding the Semantic Content of Sparse Word Embeddings Using a Commonsense Knowledge Base

Query Expansion menggunakan Word Embedding dan Pseudo Relevance Feedback

Contact Info

Product

Resources

About