This paper addresses an important problem in Example-Based Machine Translation (EBMT), namely how to measure similarity between a sentence fragment and a set of stored examples. A new method is proposed that measures similarity according to both surface structure and content. A second contribution is the use of clustering to make retrieval of the best matching example from the database more efficient. Results on a large number of test cases from the CELEX database are presented.
This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work concern the testing of the scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters.
Clustering of a translation memory is proposed to make the
retrieval of similar translation examples from a translation memory more
efficient,
while a second contribution is a metric of
text similarity which is based on both surface structure and content. Tests
on the two proposed
techniques are run on part of the CELEX database. The results reported
indicate that the
clustering of the translation memory results in a significant gain in the
retrieval response
time, while the deterioration in the retrieval accuracy can be considered
to be negligible. The
text similarity metric proposed is evaluated by a human expert and found
to be compatible
with the human perception of text similarity.
This work addresses an important problem in Example-Based Machine Translation (EBMT), namely how to make retrieval of the example that best matches the input more efficient. The use of clustering is proposed, to enable the application of the same similarity metric to first limit the search space and then locate the best available match in a database. Evaluation results are presented on a large number of test cases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.