Proceedings of the 29th Annual Meeting on Association for Computational Linguistics - 1991
DOI: 10.3115/981344.981367
|View full text |Cite
|
Sign up to set email alerts
|

A program for aligning sentences in bilingual corpora

Abstract: Researchers in both machine Iranslation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying parallel texts, texts such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (French and English). This paper describes a method for aligning sentences in these parallel texts, based on a simple statistical model of character lengths. The method was developed and tested on a small trilingual s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
326
1
24

Year Published

2004
2004
2013
2013

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 300 publications
(356 citation statements)
references
References 9 publications
1
326
1
24
Order By: Relevance
“…It can be noticed that only the methods that consider cognate words as an alignment criterion had success in omissions. In [7], Gale and Church had already mentioned the necessity of considering language-specific methods to deal adequately with this alignment category and this point was confirmed by the results reported in this paper.…”
Section: Evaluation and Results Of Sentence Alignment Methodssupporting
confidence: 77%
See 3 more Smart Citations
“…It can be noticed that only the methods that consider cognate words as an alignment criterion had success in omissions. In [7], Gale and Church had already mentioned the necessity of considering language-specific methods to deal adequately with this alignment category and this point was confirmed by the results reported in this paper.…”
Section: Evaluation and Results Of Sentence Alignment Methodssupporting
confidence: 77%
“…Precision stands for the number of 6 It is important to say that CorpusPE was evaluated with 64 pairs rather than 65 because we note that one of them was not parallel at lexical level. 7 Available in http://www.d.umn.edu/ tdeperse/code.html.…”
Section: Evaluation and Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The features for the classification process are three: (i) and (ii) are the score of the most likely alignments at sentence and paragraph level between d hi and d en , respectively. These scores were computed with the length based alignment algorithm proposed by [13]. (iii) is a lexical feature: A Hindi-English dictionary was used to gloss the Hindi documents and calculate an idf-based cosine similarity between suspicious and potential source documents.…”
Section: Submissions Overviewmentioning
confidence: 99%