Proceedings of the 19th International Conference on Computational Linguistics - 2002
DOI: 10.3115/1072228.1072394
|View full text |Cite
|
Sign up to set email alerts
|

An approach based on multilingual thesauri and model combination for bilingual lexicon extraction

Abstract: This paper focuses on exploiting different models and methods in bilingual lexicon extraction, either from parallel or comparable corpora, in specialized domains. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to combine the different models for bilingual lexicon extraction is presented. Our results show that the combination of the models significantly improves results, and that the use of the hier… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
37
0
1

Year Published

2005
2005
2015
2015

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(38 citation statements)
references
References 7 publications
(6 reference statements)
0
37
0
1
Order By: Relevance
“…While the method was evaluated on single words, the results we obtained are of direct relevance to the practical task of acquisition of translations for multi-word domain terminology: multi-word terms tend to have low corpus frequencies and algorithms for acquisition of translation from comparable corpora have already been extended to multi-word terms (e.g. Déjean et al 2002).…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…While the method was evaluated on single words, the results we obtained are of direct relevance to the practical task of acquisition of translations for multi-word domain terminology: multi-word terms tend to have low corpus frequencies and algorithms for acquisition of translation from comparable corpora have already been extended to multi-word terms (e.g. Déjean et al 2002).…”
Section: Discussionmentioning
confidence: 99%
“…To customise the translation matrix to the domain at hand, Rapp (1999) starts with only a small number of seed translation pairs and augments this translation matrix with more dimensions as the algorithm finds more equivalent terms in the corpus. Déjean et al (2002) enriched the translation matrix prepared from an available dictionary with a hierarchical multilingual thesaurus. A number of studies (Daille and Morin 2005;Robitaille et al 2006) augmented this approach with techniques for multi-word recognition and alignment to extract equivalents for multi-word expressions from comparable corpora.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This method is closely related to those proposed for automatic extraction of bilingual terminology from comparable corpora. Most of them are based on the idea that across languages there is a semantic correlation between the co-occurrences of words that are translations of each other (Rapp 1999;Fung 2000;Gaussier et al 2000Gaussier et al , 2004Déjean et al 2002). Searching for equivalents was also supported by single-word and multiword wordlists in Spanish and with the help of the reference corpus and enabled us to find corresponding Spanish terms for 80% of the English TSCs.…”
Section: Corpus Design Methodology and Data Selectionmentioning
confidence: 99%
“…Finding translation pairs: A pair of words is treated as a translation pair when their context similarity is high. Various clues have been considered when computing similarities, such as the use of concept class information obtained from a multilingual thesaurus (Déjean, Gaussier, and Sadat 2002), co-occurrence models generated from aligned documents (Prochasson and Fung 2011), and transliteration information (Shao and Ng 2004).…”
Section: Context-similarity-based Extractionmentioning
confidence: 99%