Proceedings of the 31st Annual Meeting on Association for Computational Linguistics - 1993
DOI: 10.3115/981574.981576
|View full text |Cite
|
Sign up to set email alerts
|

Aligning sentences in bilingual corpora using lexical information

Abstract: In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ignore word identities and only consider sentence length (Brown el al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statistical word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
68
0
2

Year Published

2006
2006
2016
2016

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 134 publications
(72 citation statements)
references
References 7 publications
0
68
0
2
Order By: Relevance
“…However, these need a large number of previouslyaligned texts for training, which is a great hurdle for language pairs, such as Russian-German. Moreover, as Braune and Fraser (2010) note, a large number of them are also not completely language independent and not flexible to other language pairs (Chen, 1993;Fattah et al, 2007). Thus, supervised alignment cannot be easily applied to this data and we turn back to unsupervised approaches.…”
Section: Gargantuamentioning
confidence: 99%
“…However, these need a large number of previouslyaligned texts for training, which is a great hurdle for language pairs, such as Russian-German. Moreover, as Braune and Fraser (2010) note, a large number of them are also not completely language independent and not flexible to other language pairs (Chen, 1993;Fattah et al, 2007). Thus, supervised alignment cannot be easily applied to this data and we turn back to unsupervised approaches.…”
Section: Gargantuamentioning
confidence: 99%
“…The algorithm from Brown, Gale and Chen introduced the conception of anchor and divided the whole corpus into several smaller segments when aligning Hansard corpus [3]. It adopted the specific annotation from corpus to serve as anchor, and matched these anchors with dynamic planning algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…So most parallel corpora are aligned in terms of sentences. Reviewing the literature on aligning parallel corpora, we found four main approaches to the problem of alignment at the sentence level: word length-based (Gale and Church 1991), character length-based (Brown et al 1991), dictionary-or translation-based (Chen 1993, Melamed 1996, Moore 2002, and partial similarity-based (Simard and Plamondon 1998). In this experiment, the alignment of sentences was done entirely manually.…”
Section: Aligning the Parallel Corpusmentioning
confidence: 99%