Felix Hieber scite author profile

2017

The performance of Neural Machine Translation (NMT) models relies heavily on the availability of sufficient amounts of parallel data, and an efficient and effective way of leveraging the vastly available amounts of monolingual data has yet to be found. We propose to modify the decoder in a neural sequence-to-sequence model to enable multi-task learning for two strongly related tasks: target-side language modeling and translation. The decoder predicts the next target word through two channels, a target-side language model on the lowest layer, and an attentional recurrent model which is conditioned on the source representation. This architecture allows joint training on both large amounts of monolingual and moderate amounts of bilingual data to improve NMT performance. Initial results in the news domain for three language pairs show moderate but consistent improvements over a baseline trained on bilingual data only.

Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval

Schamoni

Sokolov

et al. 2014

We present an approach to cross-language retrieval that combines dense knowledgebased features and sparse word translations. Both feature types are learned directly from relevance rankings of bilingual documents in a pairwise ranking framework. In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over learningto-rank with either only dense or only sparse features, and over very competitive baselines that combine state-of-the-art machine translation and retrieval.

Learning to translate queries for CLIR

Sokolov

Riezler

2014

The statistical machine translation (SMT) component of cross-lingual information retrieval (CLIR) systems is often regarded as black box that is optimized for translation quality independent from the retrieval task. In recent work [10], SMT has been tuned for retrieval by training a reranker on k-best translations ordered according to their retrieval performance. In this paper we propose a decomposable proxy for retrieval quality that obviates the need for costly intermediate retrieval. Furthermore, we explore the full search space of the SMT decoder by directly optimizing decoder parameters under a retrieval-based objective. Experimental results for patent retrieval show our approach to be a promising alternative to the standard pipeline approach.

Improving the Quality Trade-Off for Neural Machine Translation Multi-Domain Adaptation

Hasler¹,

Domhan²,

Trénous³

et al. 2021

Building neural machine translation systems to perform well on a specific target domain is a well-studied problem. Optimizing system performance for multiple, diverse target domains however remains a challenge. We study this problem in an adaptation setting where the goal is to preserve the existing system quality while incorporating data for domains that were not the focus of the original translation system. We find that we can improve over the performance trade-off offered by Elastic Weight Consolidation with a relatively simple data mixing strategy. At comparable performance on the new domains, catastrophic forgetting is mitigated significantly on strong WMT baselines. Combining both approaches improves the Pareto frontier on this task.

Improved answer ranking in social question-answering portals

Riezler

2011

Community QA portals provide an important resource for non-factoid question-answering. The inherent noisiness of user-generated data makes the identification of high-quality content challenging but all the more important. We present an approach to answer ranking and show the usefulness of features that explicitly model answer quality. Furthermore, we introduce the idea of leveraging snippets of web search results for query expansion in answer ranking. We present an evaluation setup that avoids spurious results reported in earlier work. Our results show the usefulness of our features and query expansion techniques, and point to the importance of regularization when learning from noisy data.