Proceedings of the 28th International Conference on Computational Linguistics 2020
DOI: 10.18653/v1/2020.coling-main.4
|View full text |Cite
|
Sign up to set email alerts
|

CharBERT: Character-aware Pre-trained Language Model

Abstract: Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units and make the representation incomplete and fragile. In this paper, we propose a character-aware pre-trained language model named CharBERT improving on the previous methods (such as BERT, RoBERTa) to tackle these problems. We first construct the contextual word embedd… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
35
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 41 publications
(36 citation statements)
references
References 27 publications
1
35
0
Order By: Relevance
“…Bostrom and Durrett (2020) pretrain RoBERTa with different tokenization methods and find tokenizations that align more closely with morphology to perform better on a number of tasks. Ma et al (2020) show that providing BERT with character-level information also leads to enhanced performance. Relatedly, studies from automatic speech recognition have demonstrated that morphological decomposition improves the perplexity of language models (Fang et al, 2015;Jain et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Bostrom and Durrett (2020) pretrain RoBERTa with different tokenization methods and find tokenizations that align more closely with morphology to perform better on a number of tasks. Ma et al (2020) show that providing BERT with character-level information also leads to enhanced performance. Relatedly, studies from automatic speech recognition have demonstrated that morphological decomposition improves the perplexity of language models (Fang et al, 2015;Jain et al, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Several works propose to optimize subwordsensitive word encoding methods for pretrained language models. Ma et al (2020) uses convolutional neural networks (Kim, 2014) on characters to calculate word representations. Zhang and Li (2020) propose to add phrases into the vocabulary for Chinese pretrained language models.…”
Section: Related Workmentioning
confidence: 99%
“…Bostrom and Durrett (2020) empirically compare several popular word segmentation algorithms for pretrained language models of a single language. Several works propose to use different representation granularities, such as phrase-level segmentation (Zhang and Li, 2020) or character-aware representations (Ma et al, 2020) for pretrained language models of a single highresource language, such as English or Chinese only. However, it is not a foregone conclusion that methods designed and tested on monolingual models will be immediately applicable to multilingual representations.…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, the authors claim that it is more robust to noise and misspellings. In the same vein, Ma et al (2020a) combined character-aware and subword-based information to improve robustness to spelling errors. This initiated a new wave of tokenizer-free models based on characters or bytes (Tay et al, 2021;Xue et al, 2021;Clark et al, 2021).…”
Section: Tokenization and Character-based Modelsmentioning
confidence: 99%