We identify a number of aspects that can boost the performance of Neural Fuzzy Repair (NFR), an easy-to-implement method to integrate translation memory matches and neural machine translation (NMT). We explore various ways of maximising the added value of retrieved matches within the NFR paradigm for eight language combinations, using Transformer NMT systems. In particular, we test the impact of different fuzzy matching techniques, sub-word-level segmentation methods and alignment-based features on overall translation quality. Furthermore, we propose a fuzzy match combination technique that aims to maximise the coverage of source words. This is supplemented with an analysis of how translation quality is affected by input sentence length and fuzzy match score. The results show that applying a combination of the tested modifications leads to a significant increase in estimated translation quality over all baselines for all language combinations.
We propose three linguistically motivated metrics to quantify syntactic equivalence between a source sentence and its translation. Syntactically Aware Cross (SACr) measures the degree of word group reordering by creating syntactically motivated groups of words that are aligned. Secondly, an intuitive approach is to compare the linguistic labels of the word-aligned source and target tokens. Finally, on a deeper linguistic level, Aligned Syntactic Tree Edit Distance (ASTrED) compares the dependency structure of both sentences. To be able to compare source and target dependency labels we make use of Universal Dependencies (UD). We provide an analysis of our metrics by comparing them with translation process data in mixed models. Even though our examples and analysis focus on English as the source language and Dutch as the target language, the proposed metrics can be applied to any language for which UD models are attainable. An open-source implementation is made available.
This article analyses the extent to which four well-known general cognitive constraints – syntactic priming,
cognitive routinisation, markedness of coding and structural integration – impact the linguistic output of translation students
and professional translators similarly. It takes subject placement variation in Dutch as a test case to gauge the effect of the
four constraints and relies on a controlled corpus of student and professional French-to-Dutch L1 news translations, from which
all declarative main clauses with either a preverbal or a postverbal subject were extracted. All corpus instances were annotated
for four random variables, the fixed variable expertise and ten other fixed variables, which were considered good proxies
for the cognitive constraints. A mixed-effects regression analysis reveals that by and large the cognitive constraints have an
identical effect on student and professional translators’ output, with priming and structural integration having the strongest
impact on subject placement. However, students diverge from professionals when translating French clauses with a left-dislocated
adjunct into Dutch, which is interpreted as an indication of a difference in automatisation when dealing with specific
French-Dutch cross-linguistic differences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.