Recently, the capability of character-level evaluation measures for machine translation output has been confirmed by several metrics. This work proposes translation edit rate on character level (CharacTER), which calculates the character level edit distance while performing the shift edit on word level. The novel metric shows high system-level correlation with human rankings, especially for morphologically rich languages. It outperforms the strong CHRF by up to 7% correlation on different metric tasks. In addition, we apply the hypothesis sentence length for normalizing the edit distance in CharacTER, which also provides significant improvements compared to using the reference sentence length.
We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.• We use a multilayer perceptron (MLP) as a trainable similarity measure to match source and target sentence embeddings.• We compare various similarity measures for embeddings in terms of score distribution, geometric interpretation, and performance in downstream tasks.• We demonstrate competitive performance in sentence alignment recovery and parallel cor-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.