Proceedings of the 15th Conference of the European Chapter of The Association for Computational Linguistics: Volume 2 2017
DOI: 10.18653/v1/e17-2098
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping Unsupervised Bilingual Lexicon Induction

Abstract: The task of unsupervised lexicon induction is to find translation pairs across monolingual corpora. We develop a novel method that creates seed lexicons by identifying cognates in the vocabularies of related languages on the basis of their frequency and lexical similarity. We apply bidirectional bootstrapping to a method which learns a linear mapping between context-based vector spaces. Experimental results on three language pairs show consistent improvement over prior work.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 9 publications
0
13
0
Order By: Relevance
“…A generative model for inducing a bilingual lexicon from monolingual corpora by exploiting orthographic and contextual similarities of words in two different languages was proposed by Haghighi et al [171]. Many methods, based on edit-distance and orthographic similarity are proposed for using linguist feature for word alignments supervised and unsupervised methods [172][173][174]. Riley and Gildea [175] proposed method to utilise the orthographic information in wordembedding based bilingual lexicon induction.…”
Section: Orthographic Information In Unsupervised Machine Translationmentioning
confidence: 99%
“…A generative model for inducing a bilingual lexicon from monolingual corpora by exploiting orthographic and contextual similarities of words in two different languages was proposed by Haghighi et al [171]. Many methods, based on edit-distance and orthographic similarity are proposed for using linguist feature for word alignments supervised and unsupervised methods [172][173][174]. Riley and Gildea [175] proposed method to utilise the orthographic information in wordembedding based bilingual lexicon induction.…”
Section: Orthographic Information In Unsupervised Machine Translationmentioning
confidence: 99%
“…One idea is to train a model using bilingual information from corpora aligned at the sentence level (Zou et al, 2013;Hermann and Blunsom, 2014;Luong et al, 2015) and document level (Vulic and Moens, 2016;Levy et al, 2017). Another is to exploit the isomorphic structure (Conneau et al, 2017;Artetxe et al, 2018), dictionary (Mikolov et al, 2013;Faruqui and Dyer, 2014;Huang et al, 2015;Zhang et al, 2016), shared cognate, vocab (Hauer et al, 2017;Smith et al, 2017), numeral (Artetxe et al, 2017) through ad-hoc projection.…”
Section: Related Workmentioning
confidence: 99%
“…DTLM achieved state-of-theart results on several tasks in which plain word types constitute the transduction target strings. Finally, our data augmentation approach is inspired by the self-training approach of Hauer et al (2017).…”
Section: Prior Workmentioning
confidence: 99%