2020
DOI: 10.1609/aaai.v34i05.6369
|View full text |Cite
|
Sign up to set email alerts
|

Simplify-Then-Translate: Automatic Preprocessing for Black-Box Translation

Abstract: Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that “simplifies” the origin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(14 citation statements)
references
References 16 publications
1
12
0
Order By: Relevance
“…Similar to machine translation, back-translation is used to improve the performance of neural SS methods (Katsuta and Yamamoto, 2019;Palmero Aprosio et al, 2019;Qiang and Wu, 2021). Mehta et al (2020) trained a paraphrasing model by generating a paraphrase corpus using back-translation, which is used to preprocess source sentences of the low-resource language pairs before feeding into the NMT system.…”
Section: Paraphrase Miningmentioning
confidence: 99%
“…Similar to machine translation, back-translation is used to improve the performance of neural SS methods (Katsuta and Yamamoto, 2019;Palmero Aprosio et al, 2019;Qiang and Wu, 2021). Mehta et al (2020) trained a paraphrasing model by generating a paraphrase corpus using back-translation, which is used to preprocess source sentences of the low-resource language pairs before feeding into the NMT system.…”
Section: Paraphrase Miningmentioning
confidence: 99%
“…While the reordering approach has generally proven effective for SMT, its effectiveness for NMT is not obvious; negative effects have even be reported (Zhu, 2015;Du and Way, 2017). In recent years, techniques of automatic text simplification have been applied to improve NMT outputs ( Štajner and Popović, 2018;Mehta et al, 2020). The underlying assumption of these studies is that simpler sentences are more machine translatable.…”
Section: Related Workmentioning
confidence: 99%
“…However, the feasibility and possibility of preediting for neural MT (NMT) has not been examined extensively. While efforts have recently been invested in the implementation of pre-editing strategies for black-box NMT settings, achieving improved MT quality (e.g., Hiraoka and Yamada, 2019;Mehta et al, 2020), the potential gains of preediting remain unexplored. Notably, the impact of pre-editing on black-box MT is unpredictable in nature.…”
Section: Introductionmentioning
confidence: 99%
“…These experiments relate to a large body of work that considers how preprocessing methods affect the downstream accuracy of various algorithms, ranging from topics in information retrieval (Chaudhari et al, 2015;Patil and Atique, 2013;Beil et al, 2002), text classification and regression (Forman, 2003;Yang and Pedersen, 1997;Vijayarani et al, 2015;Kumar and Harish, 2018;HaCohen-Kerner et al, 2020;Symeonidis et al, 2018;Weller et al, 2020), topic modeling (Blei et al, 2003;Lund et al, 2019;Schofield and Mimno, 2016;Schofield et al, 2017a,b), and even more complex tasks like question answering (Jijkoun et al, 2003;Carvalho et al, 2007) and machine translation (Habash, 2007;Habash and Sadat, 2006;Leusch et al, 2005;Weller et al, 2021;Mehta et al, 2020) to name a few. With the rise of noisy social media, text preprocessing has become important for tasks that use data from sources like Twitter and Reddit (Symeonidis et al, 2018;Singh and Kumari, 2016;Bao et al, 2014;Jianqiang, 2015;Weller and Seppi, 2020;Zirikly et al, 2019;Babanejad et al, 2020).…”
Section: Related Workmentioning
confidence: 99%