Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.267
|View full text |Cite
|
Sign up to set email alerts
|

AugVic: Exploiting BiText Vicinity for Low-Resource NMT

Abstract: The success of Neural Machine Translation (NMT) largely depends on the availability of large bitext training corpora. Due to the lack of such large corpora in low-resource language pairs, NMT systems often exhibit poor performance. Extra relevant monolingual data often helps, but acquiring it could be quite expensive, especially for low-resource languages. Moreover, domain mismatch between bitext (train/test) and monolingual data might degrade the performance. To alleviate such issues, we propose AUGVIC, a nov… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 43 publications
0
4
0
Order By: Relevance
“…Kiyono 2021, Karpukhin et al 2019). An advantage of soft replacements over hard ones is that they take into account the context of the tokens being replaced (Liu et al, 2021;Mohiuddin et al, 2021). These methods require architectural changes to a model whereas CipherDAug does not.…”
Section: Related Workmentioning
confidence: 99%
“…Kiyono 2021, Karpukhin et al 2019). An advantage of soft replacements over hard ones is that they take into account the context of the tokens being replaced (Liu et al, 2021;Mohiuddin et al, 2021). These methods require architectural changes to a model whereas CipherDAug does not.…”
Section: Related Workmentioning
confidence: 99%
“…Kiyono 2021, Karpukhin et al 2019). An advantage of soft replacements over hard ones is that they take into account the context of the tokens being replaced Mohiuddin et al, 2021). These methods require architectural changes to a model whereas CipherDAug does not.…”
Section: Related Workmentioning
confidence: 99%
“…In Chapter 3 we leverage language models to generate synthetic labeled sequences for sequence tagging task data augmentation. Data augmentation has also been proven to be useful in the crosslingual settings [68][69][70][71][72][73]. Moreover most of the exiting methods overlook the better utilization of multilingual training data when such resources are available, so in Chapter 3 we explore methods to exploit translation models and multilingual resources for data augmentation.…”
Section: Data Augmentationmentioning
confidence: 99%