Proceedings of the 16th Conference on Computational Linguistics - 1996
DOI: 10.3115/992628.992636
|View full text |Cite
|
Sign up to set email alerts
|

Extracting word correspondences from bilingual corpora based on word co-occurrences information

Abstract: A new method has been developed for extracting word correspondences from a bilingual corpus. First, the co-occurrence infi~rmation for each word in both languages is extracted li'om the corpus. Then, the correlations between the co-occurrence features of the words are calculated pairwisely with tile assistance of a basic word bilingual dictionary. Finally, the pairs of words with the highest correlations are output selectively. This method is applicable to rather small, unaligned corpora; it can extract corres… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

1998
1998
2010
2010

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(18 citation statements)
references
References 8 publications
0
18
0
Order By: Relevance
“…For example, Sadat et al (2003) automatically extracted bilingual word pairs from comparable corpora. Others have leveraged parallel corpora or bilingual dictionaries for lexical acquisition (Echizen-ya et al 2006;Kaji and Aizono 1996;Rapp 1999;Tanaka and Iwasaki, 1996). However, our work deals with the fundamentally different task of translating a large thesaurus, where one can leverage the structural properties of the resource.…”
Section: Related Workmentioning
confidence: 99%
“…For example, Sadat et al (2003) automatically extracted bilingual word pairs from comparable corpora. Others have leveraged parallel corpora or bilingual dictionaries for lexical acquisition (Echizen-ya et al 2006;Kaji and Aizono 1996;Rapp 1999;Tanaka and Iwasaki, 1996). However, our work deals with the fundamentally different task of translating a large thesaurus, where one can leverage the structural properties of the resource.…”
Section: Related Workmentioning
confidence: 99%
“…Among previous studies, one [6] uses the co-occurrence of words depending on the number of co-occurrence words and their frequency. Such a method is insufficient in terms of efficient extraction of bilingual word pairs.…”
Section: Experiments and Discussionmentioning
confidence: 99%
“…For Cosine, the association values of two words with the same context are joined using their product, while for JaccardMin (Grefenstette 1994;Kaji and Aizono 1996) and DiceMin (Curran et al 2002;van der Plas and Bouma 2004;Gamallo 2007) only the smallest association weight is considered. For the Lin coefficient, the association values of common contexts are summed (Lin 1998), where c j [ C 1,2 if and only if A(w 1 , c j ) > 0 and A(w 2 , c j ) > 0.…”
Section: Ten Similarity Measuresmentioning
confidence: 99%