Lambros Cranias scite author profile

Papageorgiou

Piperidis

1994

This paper addresses an important problem in Example-Based Machine Translation (EBMT), namely how to measure similarity between a sentence fragment and a set of stored examples. A new method is proposed that measures similarity according to both surface structure and content. A second contribution is the use of clustering to make retrieval of the best matching example from the database more efficient. Results on a large number of test cases from the CELEX database are presented.

Automatic alignment in parallel corpora

Papageorgiou

Piperidis

1994

This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work concern the testing of the scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters.

Example retrieval from a translation memory

1997

Clustering of a translation memory is proposed to make the retrieval of similar translation examples from a translation memory more efficient, while a second contribution is a metric of text similarity which is based on both surface structure and content. Tests on the two proposed techniques are run on part of the CELEX database. The results reported indicate that the clustering of the translation memory results in a significant gain in the retrieval response time, while the deterioration in the retrieval accuracy can be considered to be negligible. The text similarity metric proposed is evaluated by a human expert and found to be compatible with the human perception of text similarity.

Clustering: a technique for search space reduction in example-based machine translation

Papageorgiou

Piperidis

This work addresses an important problem in Example-Based Machine Translation (EBMT), namely how to make retrieval of the example that best matches the input more efficient. The use of clustering is proposed, to enable the application of the same similarity metric to first limit the search space and then locate the best available match in a database. Evaluation results are presented on a large number of test cases.

A new optimal algorithm for the solution of a generalised assignment problem-application in automatic text alignment