Evaluating word embeddings with fMRI and eye-tracking

Søgaard, Anders

doi:10.18653/v1/w16-2521

Cited by 35 publications

(37 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Metrics have been proposed based on co-occurrences (perplexity or word error rate), based on ability to discriminate between contexts (e.g., topic classification), and based on lexical semantics (predicting links in lexical knowledge bases). Søgaard (2016) argues that such metrics are not valid, because co-occurrences, contexts, and lexical knowledge bases are also used to induce word embeddings, and that downstream evaluation is the best way to evaluate word embeddings. The only task-independent evaluation of embeddings that is reasonable, he claims, is to evaluate word embeddings by how well they predict behavioral observations, e.g.…”

Section: Applicationsmentioning

confidence: 99%

A Survey of Cross-lingual Word Embedding Models

Ruder¹,

Vulić²,

Søgaard³

2019

jair

Self Cite

437

330

View full text Add to dashboard Cite

Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.

show abstract

Section: Applicationsmentioning

confidence: 99%

A Survey of Cross-lingual Word Embedding Models

Ruder¹,

Vulić²,

Søgaard³

2019

jair

Self Cite

437

330

View full text Add to dashboard Cite

show abstract

“…Cognitive lexical semantics proposes that words are defined by how they are organized in the brain (Miller and Fellbaum, 1992). As a result, brain activity data recorded from humans processing language is arguably the most accurate mental lexical representation available (Søgaard, 2016). Recordings of brain activity play a central role in furthering our understanding of how human language works.…”

Section: Introductionmentioning

confidence: 99%

CogniVal: A Framework for Cognitive Word Embedding Evaluation

Hollenstein

Torre

Langer

et al. 2019

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

View full text Add to dashboard Cite

An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain. However, most, if not all, previous works only focus on small datasets and a single modality. In this paper, we present the first multimodal framework for evaluating English word representations based on cognitive lexical semantics. Six types of word embeddings are evaluated by fitting them to 15 datasets of eyetracking, EEG and fMRI signals recorded during language processing. To achieve a global score over all evaluation hypotheses, we apply statistical significance testing accounting for the multiple comparisons problem. This framework is easily extensible and available to include other intrinsic and extrinsic evaluation methods. We find strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.

show abstract

“…Based on this evidence, it could be concluded that the characteristics of formulaic language could be captured through differences in the gaze patterns between formulaic and non-formulaic sequences. In a similar way, gaze data has previously been successfully used in other NLP tasks such as part-of-speech tagging (Barrett et al, 2016a) and evaluation of word embeddings (Søgaard, 2016), and it has been shown that gaze signals transfer across languages (Barrett et al, 2016b). In this sense, automatically identifying formulaic sequences based on gaze features could not only contribute to potentially improving classification accuracy and gaining insight into the cognitive processing of such units, but can also provide a language-independent approach to identification of formulaic phrases.…”

Section: Introductionmentioning

confidence: 99%

Using Gaze Data to Predict Multiword Expressions

Rohanian

Taslimipoor

Yaneva

et al. 2017

RANLP 2017 - Recent Advances in Natural Language Processing Meet Deep Learning

View full text Add to dashboard Cite

In recent years gaze data has been increasingly used to improve and evaluate NLP models due to the fact that it carries information about the cognitive processing of linguistic phenomena. In this paper we conduct a preliminary study towards the automatic identification of multiword expressions based on gaze features from native and non-native speakers of English. We report comparisons between a part-ofspeech (POS) and frequency baseline to: i) a prediction model based solely on gaze data and ii) a combined model of gaze data, POS and frequency. In spite of the challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze data (from native versus non-native speakers) affects the prediction, showing that data from the two groups is discriminative to an equal degree. Finally, we show that late processing measures are more predictive than early ones, which is in line with previous research on idioms and other formulaic structures.

show abstract

Evaluating word embeddings with fMRI and eye-tracking

Cited by 35 publications

References 15 publications

A Survey of Cross-lingual Word Embedding Models

A Survey of Cross-lingual Word Embedding Models

CogniVal: A Framework for Cognitive Word Embedding Evaluation

Using Gaze Data to Predict Multiword Expressions

Contact Info

Product

Resources

About