Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Tezcan, Arda; Bulté, Bram; Vanroy, Bram

doi:10.3390/informatics8010007

Cited by 10 publications

(28 citation statements)

References 53 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this study, we focus on a simple approach to TM-NMT integration, neural fuzzy repair (NFR), that relies on source sentence augmentation through the concatenation of translations of similar source sentences retrieved from a TM [3]. This method has been shown to work well with the Transformer architecture [29], with the FM retrieval being based on the cosine similarity of sentence embeddings [4,5]. In this paper, we do not focus on comparing different TM-MT integration methods, but rather on evaluating one NFR configuration that was shown to perform well in a previous study, using BLEU as evaluation metric [4].…”

Section: Tm-mt Integrationmentioning

confidence: 99%

“…This method has been shown to work well with the Transformer architecture [29], with the FM retrieval being based on the cosine similarity of sentence embeddings [4,5]. In this paper, we do not focus on comparing different TM-MT integration methods, but rather on evaluating one NFR configuration that was shown to perform well in a previous study, using BLEU as evaluation metric [4]. The NFR system evaluated in this study is presented in more detail in Section 4.2.…”

Section: Tm-mt Integrationmentioning

confidence: 99%

“…Machine translation (MT) systems are routinely evaluated using a restricted set of automated quality metrics, especially at early stages of development [1,2]. This was not different for neural fuzzy repair (NFR) [3][4][5], an MT data augmentation method that relies on the retrieval of translations of similar sentences, called fuzzy matches (FMs), from a translation memory (TM) or bilingual corpus. Using mainly BLEU [6], a metric quantifying the degree of exact overlap between MT output and a reference translation, substantial quality improvements were demonstrated between NFR systems and strong neural machine translation (NMT) baselines.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Tezcan

Bulté

2022

Information

Self Cite

View full text Add to dashboard Cite

Previous research has shown that simple methods of augmenting machine translation training data and input sentences with translations of similar sentences (or fuzzy matches), retrieved from a translation memory or bilingual corpus, lead to considerable improvements in translation quality, as assessed by a limited set of automatic evaluation metrics. In this study, we extend this evaluation by calculating a wider range of automated quality metrics that tap into different aspects of translation quality and by performing manual MT error analysis. Moreover, we investigate in more detail how fuzzy matches influence translations and where potential quality improvements could still be made by carrying out a series of quantitative analyses that focus on different characteristics of the retrieved fuzzy matches. The automated evaluation shows that the quality of NFR translations is higher than the NMT baseline in terms of all metrics. However, the manual error analysis did not reveal a difference between the two systems in terms of total number of translation errors; yet, different profiles emerged when considering the types of errors made. Finally, in our analysis of how fuzzy matches influence NFR translations, we identified a number of features that could be used to improve the selection of fuzzy matches for NFR data augmentation.

show abstract

Section: Tm-mt Integrationmentioning

confidence: 99%

Section: Tm-mt Integrationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Tezcan

Bulté

2022

Information

Self Cite

View full text Add to dashboard Cite

show abstract

“…The percentage of these matches is usually calculated using an algorithm based on edit distance or Levenshtein distance (Levenshtein, 1966). In addition Tezcan, Bulté, & Vanroy (2021) reported that fuzzy matching techniques use different approaches to estimate the degree of similarity between two sentences by calculating: the percentage of tokens (or characters) that appear in both segments potentially allowing for synonyms and paraphrase, the length of the longest matching sequence of tokens, or n-gram matching, the edit distance between segments, the most commonly used metric in CAT tools, automated MT evaluation metrics such as translation edit rate (TER), the amount of overlap in syntactic parse trees, or a more recently proposed method, the distance between continuous sentence representations.…”

Section: Introductionmentioning

confidence: 99%

Introducing linguistic transformation to improve translation memory retrieval Results of a professional translators’ survey for Spanish, French and Arabic

Djabri¹,

Quintana²

2021

Proceedings of the Student Research Workshop Associated With RANLP 2021

View full text Add to dashboard Cite

Translation memory systems (TMS) are the main component of computer-assisted translation (CAT) tools. They store translations allowing to save time by presenting translations on the database through matching of several types such as fuzzy matches, which are calculated by algorithms like the edit distance. However, studies have demonstrated the linguistic deficiencies of these systems and the difficulties in data retrieval or obtaining a high percentage of matching, especially after the application of syntactic and semantic transformations as the active/passive voice change, change of word order, substitution by a synonym or a personal pronoun, for instance. This paper presents the results of a pilot study where we analyze the qualitative and quantitative data of questionnaires conducted with professional translators of Spanish, French and Arabic in order to improve the effectiveness of TMS and explore all possibilities to integrate further linguistic processing from ten transformation types. The results are encouraging, and they allowed us to find out about the translation process itself; from which we propose a pre-editing processing tool to improve the matching and retrieving processes.

show abstract

“…In their recent research, Tezcan et al (2021) have proposed developing a 'neural fuzzy repair' method by using sub-word-level segmentation in fuzzy match combinations to maximise the coverage of source words. This method employs vector-based sentence similarity metrics for retrieving TM matches in combination with alignment-based features on overall translation quality.…”

Section: Tm Integration With State-of-the-art Nmtmentioning

confidence: 99%

Comparative Evaluation of Translation Memory (TM) and Machine Translation (MT) Systems in Translation between Arabic and English

Khaled¹

View full text Add to dashboard Cite

In general, advances in translation technology tools have enhanced translation quality significantly. Unfortunately, however, it seems that this is not the case for all language pairs. A concern arises when the users of translation tools want to work between different language families such as Arabic and English. The main problems facing Arabic<>English translation tools lie in Arabic’s characteristic free word order, richness of word inflection – including orthographic ambiguity – and optionality of diacritics, in addition to a lack of data resources. The aim of this study is to compare the performance of translation memory (TM) and machine translation (MT) systems in translating between Arabic and English.The research evaluates the two systems based on specific criteria relating to needs and expected results. The first part of the thesis evaluates the performance of a set of well-known TM systems when retrieving a segment of text that includes an Arabic linguistic feature. As it is widely known that TM matching metrics are based solely on the use of edit distance string measurements, it was expected that the aforementioned issues would lead to a low match percentage. The second part of the thesis evaluates multiple MT systems that use the mainstream neural machine translation (NMT) approach to translation quality. Due to a lack of training data resources and its rich morphology, it was anticipated that Arabic features would reduce the translation quality of this corpus-based approach. The systems’ output was evaluated using both automatic evaluation metrics including BLEU and hLEPOR, and TAUS human quality ranking criteria for adequacy and fluency.The study employed a black-box testing methodology to experimentally examine the TM systems through a test suite instrument and also to translate Arabic English sentences to collect the MT systems’ output. A translation threshold was used to evaluate the fuzzy matches of TM systems, while an online survey was used to collect participants’ responses to the quality of MT system’s output. The experiments’ input of both systems was extracted from Arabic<>English corpora, which was examined by means of quantitative data analysis. The results show that, when retrieving translations, the current TM matching metrics are unable to recognise Arabic features and score them appropriately. In terms of automatic translation, MT produced good results for adequacy, especially when translating from Arabic to English, but the systems’ output appeared to need post-editing for fluency. Moreover, when retrievingfrom Arabic, it was found that short sentences were handled much better by MT than by TM. The findings may be given as recommendations to software developers.

show abstract

Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation

Cited by 10 publications

References 53 publications

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Evaluating the Impact of Integrating Similar Translations into Neural Machine Translation

Introducing linguistic transformation to improve translation memory retrieval Results of a professional translators’ survey for Spanish, French and Arabic

Comparative Evaluation of Translation Memory (TM) and Machine Translation (MT) Systems in Translation between Arabic and English

Contact Info

Product

Resources

About