GTCOM Neural Machine Translation Systems for WMT19

Bei, Chao; Hao, Zong Rui; Yuan, Conghu; Liu, Qingming; Fan, Baoyong

doi:10.18653/v1/w19-5305

Cited by 7 publications

(3 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2.5.16 GTCOM (Bei et al, 2019) GTCOM's systems (sysNameGTCOM-Primary) mainly focus on backtranslation, knowledge distillation and reranking to build a competitive model with transformer architecture. Also, the language model is applied to filter monolingual data, backtranslated data and parallel data.…”

Section: Frank-s-mtmentioning

confidence: 99%

Findings of the 2019 Conference on Machine Translation (WMT19)

Barrault¹,

Bojar²,

Costa-jussà³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

333

261

View full text Add to dashboard Cite

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019. Participants were asked to build machine translation systems for any of 18 language pairs, to be evaluated on a test set of news stories. The main metric for this task is human judgment of translation quality. The task was also opened up to additional test suites to probe specific aspects of translation.

show abstract

Section: Frank-s-mtmentioning

confidence: 99%

Findings of the 2019 Conference on Machine Translation (WMT19)

Barrault¹,

Bojar²,

Costa-jussà³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)

333

261

View full text Add to dashboard Cite

show abstract

“…Comparing Row-11 with its corresponding full-BT baseline in Row-8, we see that this helps for gu→en, giving a further boost in performance of +0.8 to get a final BLEU score of 20.8. To the best of our knowledge this outperforms the bilingual SoTA performance for gu→en (Bei et al, 2019). To summarize, except for hi→en, Iterative-BT helps improve Hinted BT significantly.…”

Section: Iterative Hintedbtmentioning

confidence: 61%

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

Ramnath¹,

Johnson²,

Gupta³

et al. 2021

Preprint

View full text Add to dashboard Cite

Back-translation (BT) of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT), especially for low-resource language pairs. To improve effectiveness of the available BT data, we introduce HintedBT-a family of techniques which provides hints (through tags) to the encoder and decoder. First, we propose a novel method of using both high and low quality BT data by providing hints (as source tags on the encoder) to the model about the quality of each source-target pair. We don't filter out low quality data but instead show that these hints enable the model to learn effectively from noisy data. Second, we address the problem of predicting whether a source token needs to be translated or transliterated to the target language, which is common in crossscript translation tasks (i.e., where source and target do not share the written script). For such cases, we propose training the model with additional hints (as target tags on the decoder) that provide information about the operation required on the source (translation or both translation and transliteration). We conduct experiments and detailed analyses on standard WMT benchmarks for three crossscript low/medium-resource language pairs: {Hindi,Gujarati,Tamil}→English. Our methods compare favorably with five strong and well established baselines. We show that using these hints, both separately and together, significantly improves translation quality and leads to state-of-the-art performance in all three language pairs in corresponding bilingual settings.

show abstract

“…We retained the top 80% of sentence pairs based on the alignment score(a score generated by the word alignment model that measures the quality of word alignment between source and target sentences), encompassing all directions. Subsequently, we trained the Transformer model for all languages using Fairseq, following a similar approach as outlined in the study conducted by Bei et al (Bei et al, 2019). The scores were calculated as follows:…”

Section: Bitext Datamentioning

confidence: 99%

Yishu: Yishu at WMT2023 Translation Task

Min,

Tan,

Chen

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

This paper introduces the Dtranx AI translation system, developed for the WMT 2023 Universal Translation Shared Task. Our team participated in two language directions: English to Chinese and Chinese to English. Our primary focus was on enhancing the effectiveness of the Chinese-to-English model through the implementation of bilingual models. Our approach involved various techniques such as data corpus filtering, model size scaling, sparse expert models (especially the Transformer model with adapters), large-scale back-translation, and language model reordering. According to automatic evaluation, our system secured the first place in the English-to-Chinese category and the second place in the Chinese-to-English category.

show abstract

GTCOM Neural Machine Translation Systems for WMT19

Cited by 7 publications

References 13 publications

Findings of the 2019 Conference on Machine Translation (WMT19)

Findings of the 2019 Conference on Machine Translation (WMT19)

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

Yishu: Yishu at WMT2023 Translation Task

Contact Info

Product

Resources

About