Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

Sutawika, Lintang; Cruz, Jan Christian Blaise

doi:10.48550/arxiv.2111.10513

Search citation statements

Order By: Relevance

Paper Sections

Select...

Ratio-based1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 9 publications

(9 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We employ ratio-based filters on tokenized sentence pairs following Cruz and Sutawika (2022) and Sutawika and Cruz (2021). We first tokenize using SacreMoses 2 then apply the following ratio-based filters: 7,143,725 115,239,312 95,954,020 Synthetic he→en 73,278,018 1,471,827,973 1,056,677,671 Synthetic he→en Filtered 47,372,416 659,409,236 541,376,459 Table 1: Corpus Statistics.…”

Section: Ratio-basedmentioning

confidence: 99%

Samsung R&D Institute Philippines at WMT 2023

Cruz

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

In this paper, we describe the constrained MT systems submitted by Samsung R&D Institute Philippines to the WMT 2023 General Translation Task for two directions: en→he and he→en. Our systems comprise of Transformerbased sequence-to-sequence models that are trained with a mix of best practices: comprehensive data preprocessing pipelines, synthetic backtranslated data, and the use of noisy channel reranking during online decoding. Our models perform comparably to, and sometimes outperform, strong baseline unconstrained systems such as mBART50 M2M and NLLB 200 MoE despite having significantly fewer parameters on two public benchmarks: FLORES-200 and NTREX-128.

show abstract

Section: Ratio-basedmentioning

confidence: 99%

Samsung R&D Institute Philippines at WMT 2023

Cruz

2023

Proceedings of the Eighth Conference on Machine Translation

View full text Add to dashboard Cite

show abstract

Data Processing Matters: SRPH-Konvergen AI's Machine Translation System for WMT'21

Cited by 1 publication

References 9 publications

Samsung R&D Institute Philippines at WMT 2023

Samsung R&D Institute Philippines at WMT 2023

Contact Info

Product

Resources

About