Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1042
|View full text |Cite
|
Sign up to set email alerts
|

A Discriminative Latent-Variable Model for Bilingual Lexicon Induction

Abstract: We introduce a novel discriminative latentvariable model for the task of bilingual lexicon induction. Our model combines the bipartite matching dictionary prior of Haghighi et al. (2008) with a state-of-the-art embeddingbased approach. To train the model, we derive an efficient Viterbi EM algorithm. We provide empirical improvements on six language pairs under two metrics and show that the prior theoretically and empirically helps to mitigate the hubness problem. We also demonstrate how previous work may be vi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 22 publications
(20 citation statements)
references
References 18 publications
0
19
1
Order By: Relevance
“…This yields a more nuanced picture of the known deficiency of word embeddings to underperform on infrequent words (Gong et al, 2018). Our findings also contradict the strong empirical claims made elsewhere in the literature (Artetxe et al, 2017;Conneau et al, 2018;Ruder et al, 2018;Grave et al, 2018b), as we observe that performance severely degrades when the evaluation includes rare morphological variants of a word and infrequent lexemes. We picture this general trend in Figure 1, which also highlights the skew of existing dictionaries towards more frequent words.…”
Section: Introductioncontrasting
confidence: 55%
See 2 more Smart Citations
“…This yields a more nuanced picture of the known deficiency of word embeddings to underperform on infrequent words (Gong et al, 2018). Our findings also contradict the strong empirical claims made elsewhere in the literature (Artetxe et al, 2017;Conneau et al, 2018;Ruder et al, 2018;Grave et al, 2018b), as we observe that performance severely degrades when the evaluation includes rare morphological variants of a word and infrequent lexemes. We picture this general trend in Figure 1, which also highlights the skew of existing dictionaries towards more frequent words.…”
Section: Introductioncontrasting
confidence: 55%
“…In this paper we ask whether current methods for bilingual lexicon induction (BLI) generalize morphologically as French-Spanish Figure 1: The relation between the BLI performance and the frequency of source words in the test dictionary. The graph presents results for the model of Ruder et al (2018) evaluated on both the MUSE dictionary (Conneau et al, 2018) and our morphologically complete dictionary, which contains many rare morphological variants of words. The numbers above the bars correspond to the number of translated source words (a hyphen represents an empty dictionary).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…We evaluate a range of both supervised and unsupervised projection-based CLE models. While supervised models (Mikolov et al, 2013;Smith et al, 2017;Ruder et al, 2018a; learn the projections using existing dictionaries, unsupervised models first induce seed dictionaries without bilingual data. We include unsupervised models with diverse dictionary induction strategies: adversarial learning (Conneau et al, 2018a), similarity-based heuristics (Artetxe et al, 2018b), PCA (Hoshen and Wolf, 2018), and optimal transport (Alvarez-Melis and Jaakkola, 2018).…”
Section: Projection-based Cle Modelsmentioning
confidence: 99%
“…Recently, a large number of projection-based models have been proposed for inducing bilingual word embedding spaces (Smith et al, 2017;Conneau et al, 2018;Artetxe et al, 2018;Ruder et al, 2018a;Joulin et al, 2018, inter alia), most of them requiring limited (word-level) or no bilingual supervision. Based on a few thousand (manually created or automatically induced) word-translation pairs, these models learn a linear mapping W g that projects the vectors from X L2 to the space X L1 : g(X L2 ) = X L2 W g .…”
Section: Lexical Constraints As Training Instancesmentioning
confidence: 99%