Vocabulary Adaptation for Domain Adaptation in Neural Machine Translation

Yang

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

et al. 2021

A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary. This potentially weakens the effect when applying pretrained models into natural language generation (NLG) tasks, especially for the subword distributions between upstream and downstream tasks with significant discrepancy. Towards approaching this problem, we extend the vanilla pretrain-finetune pipeline with an extra embedding transfer step. Specifically, a plug-and-play embedding generator is introduced to produce the representation of any input token, according to pre-trained embeddings of its morphologically similar ones. Thus, embeddings of mismatch tokens in downstream tasks can also be efficiently initialized. We conduct experiments on a variety of NLG tasks under the pretrain-finetune fashion. Experimental results and extensive analyses show that the proposed strategy offers us opportunities to feel free to transfer the vocabulary, leading to more efficient and better performed downstream NLG models. 1

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

Yang

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

et al. 2021

Findings of the Association for Computational Linguistics: EMNLP 2020

“…However, it has been reported that adversarial typos can degrade a BERT model that uses subword tokenization (Pruthi et al, 2019;Sun et al, 2020). Subword meanings change across domains, making domain adaptation difficult (Sato et al, 2020). These problems are more critical in the processing of noisy text (Wang et al, 2020;Niu et al, 2020).…”

Section: Bosmentioning

confidence: 99%

“…The dynamic nature of language and the limited size of training data requires neural network models to handle out-of-vocabulary (OOV) words that are absent from the training data. We thus use an UNK embedding shared among diverse OOV words or break those OOV words into semanticallyambiguous subwords (even characters), leading to poor task performance (Peng et al, 2019;Sato et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

Robust Backed-off Estimation of Out-of-Vocabulary Embeddings

Fukuda¹,

Yoshinaga²

2020

Self Cite

Out-of-vocabulary (OOV) words cause serious troubles in solving natural language tasks with a neural network. Existing approaches to this problem resort to using subwords, which are shorter and more ambiguous units than words, in order to represent OOV words with a bag of subwords. In this study, inspired by the processes for creating words from known words, we propose a robust method of estimating OOV word embeddings by referring to pre-trained word embeddings for known words with similar surfaces to target OOV words. We collect known words by segmenting OOV words and by approximate string matching, and we then aggregate their pre-trained embeddings. Experimental results show that the obtained OOV word embeddings improve not only word similarity tasks but also downstream tasks in Twitter and biomedical domains where OOV words often appear, even when the computed OOV embeddings are integrated into a BERT-based strong baseline.

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

“…Distant domain transfer learning (DDTL) is a pretty common scenario in real transfer learning applications [1][2][3][4][5]. However, there is one key challenge to overcome, whereby the current transfer learning approaches will not work well when the source domain is very distant from the target domain.…”

Section: Introductionmentioning

confidence: 99%

Transitive Transfer Sparse Coding for Distant Domain

Feng

Qian

et al. 2021

The transfer learning between the source and target domain has already achieved significant success in machine learning areas. However, the existing methods can not achieve satisfactory result when solving the two distant domains transfer learning problem. In the worst case, it could lead to the negative transfer. In this paper, we propose a novel framework called transitive transfer sparse coding (TTSC) to solve the two distant domains transfer learning problem. On the one hand, as an extension of the sparse coding, the TTSC framework constructs a robust and high-level dictionary across three different domains and simultaneously obtains three good feature sparse representations. On the other hand, TTSC utilizes the intermediate domain as a strong bridge to transfer valuable knowledge between the source domain and target domain. Empirical studies validated that the TTSC framework significantly could outperform state-of-the-art methods.