Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics - 1999
DOI: 10.3115/1034678.1034756
|View full text |Cite
|
Sign up to set email alerts
|

Automatic identification of word translations from unrelated English and German corpora

Abstract: Algorithms for the alignment of words in translated texts are well established. However, only recently new approaches have been proposed to identify word translations from non-parallel or even unrelated texts. This task is more difficult, because most statistical clues useful in the processing of parallel texts cannot be applied to non-parallel texts. Whereas for parallel texts in some studies up to 99% of the word alignments have been shown to be correct, the accuracy for non-parallel texts has been around 30… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
273
1
4

Year Published

2005
2005
2016
2016

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 263 publications
(282 citation statements)
references
References 23 publications
2
273
1
4
Order By: Relevance
“…If two documents are mutual translations, the sequence of positions of those terms should be correlated. Much past research (Ma and Liberman, 1999;Rapp, 1999) has exploited these features, using a fixed-size window and counting the co-occurrences in this range. This method, however, requires considerable tuning of parameters, and if two shared terms are located outside of the window, no credit will be added.…”
Section: Term Position Similarity (Ufal-1)mentioning
confidence: 99%
See 1 more Smart Citation
“…If two documents are mutual translations, the sequence of positions of those terms should be correlated. Much past research (Ma and Liberman, 1999;Rapp, 1999) has exploited these features, using a fixed-size window and counting the co-occurrences in this range. This method, however, requires considerable tuning of parameters, and if two shared terms are located outside of the window, no credit will be added.…”
Section: Term Position Similarity (Ufal-1)mentioning
confidence: 99%
“…the similarity of document URLs or language tags within URLs), some emphasize more the actual content of the documents. Previous work (Rapp, 1999;Ma and Liberman, 1999) focused on document alignment by counting word co-occurrences between source and target documents in a fixed-size window. More recently, methods from cross-lingual information retrieval (CLIR) have been used (Snover et al, 2008;Abdul Rauf and Schwenk, 2011), ranking lists of target documents given a source document by a probabilistic model.…”
Section: Introductionmentioning
confidence: 99%
“…Our approach for building bilingual dictionaries has been influenced by Rapp (1999) and Koehn and Knight (2002). Also in our approach we build a seed dictionary within the first phase of building bilingual dictionary, but opposite to Koehn and Knight (2002), we show that knowledge-poor data mining methods can be used successfully even for the languages belonging to different families (English and Polish).…”
Section: Related Workmentioning
confidence: 99%
“…These include exploring query logs (Brill et al, 2001), unrelated corpus (Rapp, 1999), and comparable corpus (Fung & Yee, 1998;Huang, Zhang, & Vogel, 2005;Nie et al, 1999). To establish correspondence, these algorithms usually rely on one or more statistical clues, such as the correlation between word frequencies, cognates of similar spelling, or pronunciations.…”
Section: Related Workmentioning
confidence: 99%