Deep Multilingual Correlation for Improved Word Embeddings

Lu, Ang; Wang, Weiran; Bansal, Mohit; Gimpel, Kevin; Livescu, Karen

doi:10.3115/v1/n15-1028

Cited by 107 publications

(90 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to that, we would like to explore non-linear transformations (Lu et al, 2015) and alternative dictionary induction methods Smith et al, 2017). Finally, we would like to apply our model in the decipherment scenario (Dou et al, 2015).…”

Section: Discussionmentioning

confidence: 99%

Learning bilingual word embeddings with (almost) no bilingual data

Artetxe

Labaka

Agirre

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

391

523

View full text Add to dashboard Cite

Most methods to learn bilingual word embeddings rely on large parallel corpora, which is difficult to obtain for most language pairs. This has motivated an active research line to relax this requirement, with methods that use document-aligned corpora or bilingual dictionaries of a few thousand words instead. In this work, we further reduce the need of bilingual resources using a very simple self-learning approach that can be combined with any dictionary-based mapping technique. Our method exploits the structural similarity of embedding spaces, and works with as little bilingual evidence as a 25 word dictionary or even an automatically generated list of numerals, obtaining results comparable to those of systems that use richer resources.

show abstract

Section: Discussionmentioning

confidence: 99%

Learning bilingual word embeddings with (almost) no bilingual data

Artetxe

Labaka

Agirre

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

391

523

View full text Add to dashboard Cite

show abstract

“…There is an expansive body of research on learning multilingual word embeddings (Gouws et al, 2014;Faruqui and Dyer, 2014;Lu et al, 2015;Lauly et al, 2014;Luong et al, 2015). Previous work has shown its effectiveness across a wide range of multilingual transfer tasks including tagging (Kim et al, 2015), syntactic parsing (Xiao and Guo, 2014;Guo et al, 2015;Durrett et al, 2012), and machine translation (Zou et al, 2013;Mikolov et al, 2013b).…”

Section: Multilingual Word Embeddingsmentioning

confidence: 99%

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Zhang

Gaddy

Barzilay

et al. 2016

Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

In the absence of annotations in the target language, multilingual models typically draw on extensive parallel resources. In this paper, we demonstrate that accurate multilingual partof-speech (POS) tagging can be done with just a few (e.g., ten) word translation pairs. We use the translation pairs to establish a coarse linear isometric (orthonormal) mapping between monolingual embeddings. This enables the supervised source model expressed in terms of embeddings to be used directly on the target language. We further refine the model in an unsupervised manner by initializing and regularizing it to be close to the direct transfer model. Averaged across six languages, our model yields a 37.5% absolute improvement over the monolingual prototypedriven method (Haghighi and Klein, 2006) when using a comparable amount of supervision. Moreover, to highlight key linguistic characteristics of the generated tags, we use them to predict typological properties of languages, obtaining a 50% error reduction relative to the prototype model.

show abstract

“…Most methods rely on supervision encoded in parallel data, at the document level (Vulić and Moens, 2015), the sentence level (Zou et al, 2013;Chandar A P et al, 2014;Hermann and Blunsom, 2014;Kočiský et al, 2014;Luong et al, 2015;Coulmance et al, 2015;Oshikiri et al, 2016), or the word level (i.e. in the form of seed lexicon) (Gouws and Søgaard, 2015;Wick et al, 2016;Duong et al, 2016;Shi et al, 2015;Mikolov et al, 2013a;Faruqui and Dyer, 2014;Lu et al, 2015;Ammar et al, 2016;Zhang et al, 2016aZhang et al, , 2017Smith et al, 2017).…”

Section: Bilingual Lexicon Inductionmentioning

confidence: 99%

“…a linear map, to connect separately trained word embeddings cross-lingually. Learning such a transformation typically calls for cross-lingual supervision from parallel data (Faruqui and Dyer, 2014;Lu et al, 2015;Smith et al, 2017).…”

Section: Introductionmentioning

confidence: 99%

Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction

Zhang¹,

Liu²,

Luan³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

107

102

View full text Add to dashboard Cite

Cross-lingual natural language processing hinges on the premise that there exists invariance across languages. At the word level, researchers have identified such invariance in the word embedding semantic spaces of different languages. However, in order to connect the separate spaces, cross-lingual supervision encoded in parallel data is typically required. In this paper, we attempt to establish the cross-lingual connection without relying on any cross-lingual supervision. By viewing word embedding spaces as distributions, we propose to minimize their earth mover's distance, a measure of divergence between distributions. We demonstrate the success on the unsupervised bilingual lexicon induction task. In addition, we reveal an interesting finding that the earth mover's distance shows potential as a measure of language difference.

show abstract

Deep Multilingual Correlation for Improved Word Embeddings

Cited by 107 publications

References 31 publications

Learning bilingual word embeddings with (almost) no bilingual data

Learning bilingual word embeddings with (almost) no bilingual data

Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings

Earth Mover's Distance Minimization for Unsupervised Bilingual Lexicon Induction

Contact Info

Product

Resources

About