Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) 2019
DOI: 10.18653/v1/k19-1030
|View full text |Cite
|
Sign up to set email alerts
|

Improving Pre-Trained Multilingual Model with Vocabulary Expansion

Abstract: Recently, pre-trained language models have achieved remarkable success in a broad range of natural language processing tasks. However, in multilingual setting, it is extremely resource-consuming to pre-train a deep language model over large-scale corpora for each language. Instead of exhaustively pre-training monolingual language models independently, an alternative solution is to pre-train a powerful multilingual deep language model over large-scale corpora in hundreds of languages. However, the vocabulary si… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
12
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(18 citation statements)
references
References 58 publications
(50 reference statements)
0
12
0
Order By: Relevance
“…Our vector space alignment strategy is inspired by cross-lingual word vector alignment (e.g., Mikolov et al (2013b);Smith et al (2017)). A related method was recently applied by Wang et al (2019a) to map cross-lingual word vectors into the multilingual BERT wordpiece vector space.…”
Section: Vector Space Alignmentmentioning
confidence: 99%
“…Our vector space alignment strategy is inspired by cross-lingual word vector alignment (e.g., Mikolov et al (2013b);Smith et al (2017)). A related method was recently applied by Wang et al (2019a) to map cross-lingual word vectors into the multilingual BERT wordpiece vector space.…”
Section: Vector Space Alignmentmentioning
confidence: 99%
“…Moreover, SciBERT (Beltagy et al, 2019) found that in-domain vocabulary is helpful but not significant while we attribute it to the inefficiency of implicit learning of in-domain vocabulary. To represent OOV words in multilingual settings, the mixture mapping method (Wang et al, 2019) utilized a mixture of English subwords embedding, but it has been shown useless for domain-specific words by Tai et al (2020). ExBERT (Tai et al, 2020) applied an extension module to adapt an augmenting embedding for the in-domain vocabulary but it still needs large continuous pre-training.…”
Section: Related Workmentioning
confidence: 99%
“…Here, we add new tokens to the vocabulary and increase the model size, motivated by prior work [29]. As this increases the network parameters, these are used as a secondary baseline to be compared with surrogates.…”
Section: Additional Tokensmentioning
confidence: 99%