Jan Rosendahl scite author profile

Jan Rosendahl

14Publications

61Citation Statements Received

147Citation Statements Given

How they've been cited

How they cite others

189

147

Affiliations

RWTH Aachen University

Publications

Order By: Most citations

The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task

Rossenbach¹,

Rosendahl²,

Kim³

et al. 2018

View full text Add to dashboard Cite

This paper describes the submission of RWTH Aachen University for the De→En parallel corpus filtering task of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use several rule-based, heuristic methods to preselect sentence pairs. These sentence pairs are scored with count-based and neural systems as language and translation models. In addition to single sentence-pair scoring, we further implement a simple redundancy removing heuristic. Our best performing corpus filtering system relies on recurrent neural language models and translation models based on the transformer architecture. A model trained on 10M randomly sampled tokens reaches a performance of 9.2% BLEU on newstest2018. Using our filtering and ranking techniques we achieve 34.8% BLEU.

show abstract

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018

Schamper¹,

Rosendahl²,

Bahar³

et al. 2018

View full text Add to dashboard Cite

This paper describes the statistical machine translation systems developed at RWTH Aachen University for the German→English, English→Turkish and Chinese→English translation tasks of the EMNLP 2018 Third Conference on Machine Translation (WMT 2018). We use ensembles of neural machine translation systems based on the Transformer architecture. Our main focus is on the German→English task where we scored first with respect to all automatic metrics provided by the organizers. We identify data selection, fine-tuning, batch size and model dimension as important hyperparameters. In total we improve by 6.8% BLEU over our last year's submission and by 4.8% BLEU over the winning system of the 2017 German→English task. In English→Turkish task, we show 3.6% BLEU improvement over the last year's winning system. We further report results on the Chinese→English task where we improve 2.2% BLEU on average over our baseline systems but stay behind the 2018 winning systems.

show abstract

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

Kim

Rosendahl²,

Rossenbach

et al. 2019

View full text Add to dashboard Cite

We propose a novel model architecture and training algorithm to learn bilingual sentence embeddings from a combination of parallel and monolingual data. Our method connects autoencoding and neural machine translation to force the source and target sentence embeddings to share the same space without the help of a pivot language or an additional transformation. We train a multilayer perceptron on top of the sentence embeddings to extract good bilingual sentence pairs from nonparallel or noisy parallel data. Our approach shows promising performance on sentence alignment recovery and the WMT 2018 parallel corpus filtering tasks with only a single model.• We use a multilayer perceptron (MLP) as a trainable similarity measure to match source and target sentence embeddings.• We compare various similarity measures for embeddings in terms of score distribution, geometric interpretation, and performance in downstream tasks.• We demonstrate competitive performance in sentence alignment recovery and parallel cor-

show abstract

Efficient Sequence Training of Attention Models using Approximative Recombination

Wynands¹,

Michel²,

Rosendahl³

et al. 2021

Preprint

View full text Add to dashboard Cite

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

Peter¹,

Guta²,

Alkhouli³

et al. 2017

View full text Add to dashboard Cite

This paper describes the statistical machine translation system developed at RWTH Aachen University for the English→German and German→English translation tasks of the EMNLP 2017 Second Conference on Machine Translation (WMT 2017). We use ensembles of attention-based neural machine translation system for both directions. We use the provided parallel and synthetic data to train the models. In addition, we also create a phrasal system using joint translation and reordering models in decoding and neural models in rescoring.

show abstract

Locality-Sensitive Hashing for Long Context Neural Machine Translation

Petrick¹,

Rosendahl²,

Herold³

et al. 2022

View full text Add to dashboard Cite

Recurrent Attention for the Transformer

Rosendahl¹,

Herold²,

Petrick³

et al. 2021

View full text Add to dashboard Cite

In this work, we conduct a comprehensive investigation on one of the centerpieces of modern machine translation systems: the encoderdecoder attention mechanism. Motivated by the concept of first-order alignments, we extend the (cross-)attention mechanism by a recurrent connection, allowing direct access to previous attention/alignment decisions. We propose several ways to include such a recurrency into the attention mechanism. Verifying their performance across different translation tasks we conclude that these extensions and dependencies are not beneficial for the translation performance of the Transformer architecture.

show abstract

Analysis of Positional Encodings for Neural Machine Translation

Rosendahl¹,

Tran²,

Wang³

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jan Rosendahl

The RWTH Aachen University Filtering System for the WMT 2018 Parallel Corpus Filtering Task

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

Efficient Sequence Training of Attention Models using Approximative Recombination

The RWTH Aachen University English-German and German-English Machine Translation System for WMT 2017

Locality-Sensitive Hashing for Long Context Neural Machine Translation

Recurrent Attention for the Transformer

Analysis of Positional Encodings for Neural Machine Translation

Contact Info

Product

Resources

About