Computational Linguistics and Intelligent Text Processing
DOI: 10.1007/978-3-540-78135-6_36
|View full text |Cite
|
Sign up to set email alerts
|

Learning Spanish-Galician Translation Equivalents Using a Comparable Corpus and a Bilingual Dictionary

Abstract: Abstract. So far, research on extraction of translation equivalents from comparable, non-parallel corpora has not been very popular. The main reason was the poor results when compared to those obtained from aligned parallel corpora. The method proposed in this paper, relying on seed patterns generated from external bilingual dictionaries, allows us to achieve similar results to those from parallel corpus. In this way, the huge amount of comparable corpora available via Web can be viewed as a never-ending sourc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 18 publications
0
7
0
Order By: Relevance
“…Most approaches to extract translation equivalents from monolingual corpora define the contextual distribution of a word by considering bilingual pairs of seed words. In most cases, seed words are provided by external bilingual dictionaries (Fung and McKeown 1997;Fung and Yee 1998;Rapp 1999;Chiao and Zweigenbaum 2002;Shao and Ng 2004;Saralegi, Vicente, and Gurrutxaga 2008;Gamallo 2007;Gamallo and Pichel 2008;Yu and Tsujii 2009a;Ismail and Manandhar 2010;Rubino and Linarés 2011;Tamura, Watanabe, and Sumita 2012;Aker, Paramita, and Gaizauskas 2013;Ansari et al 2014). So, a word in the target language is a translation candidate of a word in the source language if it tends to co-occur with the pairs of words from the seed words.…”
Section: Cross-lingual Word Similarity From Monolingual Corporamentioning
confidence: 99%
See 1 more Smart Citation
“…Most approaches to extract translation equivalents from monolingual corpora define the contextual distribution of a word by considering bilingual pairs of seed words. In most cases, seed words are provided by external bilingual dictionaries (Fung and McKeown 1997;Fung and Yee 1998;Rapp 1999;Chiao and Zweigenbaum 2002;Shao and Ng 2004;Saralegi, Vicente, and Gurrutxaga 2008;Gamallo 2007;Gamallo and Pichel 2008;Yu and Tsujii 2009a;Ismail and Manandhar 2010;Rubino and Linarés 2011;Tamura, Watanabe, and Sumita 2012;Aker, Paramita, and Gaizauskas 2013;Ansari et al 2014). So, a word in the target language is a translation candidate of a word in the source language if it tends to co-occur with the pairs of words from the seed words.…”
Section: Cross-lingual Word Similarity From Monolingual Corporamentioning
confidence: 99%
“…Unlike most approaches to extract word translations from monolingual corpora, which are based on windowing techniques without syntactic information, we will use a method that relies on dependency-based contexts. A significant number of papers report that contexts based on syntactic dependencies outperform window-based strategies in bilingual extraction (Gamallo and Pichel 2008;Yu and Tsujii 2009b;Andrade, Matsuzaki, and Tsujii 2011;Hazem and Morin 2014).…”
Section: Cross-lingual Word Similarity From Monolingual Corporamentioning
confidence: 99%
“…Distributional similarity is the validation method we used in the two proposed strategies (transitivity and cognates Candidates for translation equivalents: From each very comparable pair of articles, we calculate the distributional similarity and select the most similar word pairs, which are considered to be candidates for translation equivalents (Gamallo / Pichel 2008, Gamallo 2007. We also take into account multiwords.…”
Section: Basic Assumptionsmentioning
confidence: 99%
“…More precisely, bilingual vectors are derived from both a bilingual dictionary used to define word contexts and non-parallel corpora used to obtain bilingual word co-occurrences with those contexts. Consequently, to build bilingual word vectors, first we employ the traditional approach to extract translation equivalents from non-parallel texts (Fung and Yee 1998;Rapp 1999;Gamallo 2007;Gamallo and Pichel 2008) and then, bilingual vectors are combined using the compositional model we have defined above in the current paper's section. The vectors were built by making use of a non-parallel corpus that consists of an English part containing the first 200M words from ukWaC corpus (Baroni et al 2009).…”
Section: A Case Study: Translating Polysemous Wordsmentioning
confidence: 99%