Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

Lai, Yongqing; Liu, Yijia; Feng, Yansong; Huang, Songfang; Zhao, Dongyan

doi:10.48550/arxiv.2104.07204

Cited by 4 publications

(7 citation statements)

References 23 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Existing Chinese BERT models that incorporate word information can be divided into two categories. The first category uses word information in the pretraining stage but represents a text as a sequence of characters when the pretrained model is applied to downstream tasks (Cui et al, 2019a;Lai et al, 2021). The second category uses word information when the pretrained model is used in downstream tasks (Su, 2020;Guo et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

“…• Lattice BERT: Lai et al (2021) uses lexicons to enhance the character-level encodings (left side of the encoder in Figure . 3 (b) ). It uses the parallel structure in the transformers to discriminate characters and additional lexicons.…”

Section: Appendix a Word-level Chinese Bert Modelsmentioning

confidence: 99%

“…-Lattice-BERT (Lai et al, 2021): the state-ofthe-art multi-granularity model that uses lexicons as word-level knowledge concatenated to the original input context.…”

Section: Experiments On Language Understanding Taskmentioning

confidence: 99%

“…There are several attempts at building Chinese BERT models where word information is considered. Existing studies tokenize a word as a basic unit (Su, 2020), as multiple characters (Cui et al, 2019a) or a combination of both Lai et al, 2021;Guo et al, 2021). However, due to the limit of the vocabulary size of BERT, these models only learn for a limited number (e.g., 40K) of words with high frequency.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

MarkBERT: Marking Word Boundaries Improves Chinese BERT

Li¹,

Dai²,

Tang³

et al. 2022

Preprint

View full text Add to dashboard Cite

We present a Chinese BERT model dubbed MarkBERT that uses word information. Existing word-based BERT models regard words as basic units, however, due to the vocabulary limit of BERT, they only cover high-frequency words and fall back to character level when encountering out-of-vocabulary (OOV) words. Different from existing works, MarkBERT keeps the vocabulary being Chinese characters and inserts boundary markers between contiguous words. Such design enables the model to handle any words in the same way, no matter they are OOV words or not. Besides, our model has two additional benefits: first, it is convenient to add word-level learning objectives over markers, which is complementary to traditional character and sentencelevel pre-training tasks; second, it can easily incorporate richer semantics such as POS tags of words by replacing generic markers with POS tag-specific markers. MarkBERT pushes the state-of-the-art of Chinese named entity recognition from 95.4% to 96.5% on the MSRA dataset and from 82.8% to 84.2% on the OntoNotes dataset, respectively. Compared to previous word-based BERT models, MarkBERT achieves better accuracy on text classification, keyword recognition, and semantic similarity tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Appendix a Word-level Chinese Bert Modelsmentioning

confidence: 99%

“…-Lattice-BERT (Lai et al, 2021): the state-ofthe-art multi-granularity model that uses lexicons as word-level knowledge concatenated to the original input context.…”

Section: Experiments On Language Understanding Taskmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

MarkBERT: Marking Word Boundaries Improves Chinese BERT

Li¹,

Dai²,

Tang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Language features are considered in more recent works. For example, AMBERT (Zhang and Li, 2020) and Lattice-BERT (Lai et al, 2021) both take word information into consideration. Chinese-BERT (Sun et al, 2021) utilizes pinyin and glyph of characters.…”

Section: Related Workmentioning

confidence: 99%

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Dai¹,

Li²,

Zhang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Whole word masking (WWM), which masks all subwords corresponding to a word at once, makes a better English BERT model (Sennrich et al., 2016). For the Chinese language, however, there is no subword because each token is an atomic character. The meaning of a word in Chinese is different in that a word is a compositional unit consisting of multiple characters. Such difference motivates us to investigate whether WWM leads to better context understanding ability for Chinese BERT. To achieve this, we introduce two probing tasks related to grammatical error correction and ask pretrained models to revise or insert tokens in a masked language modeling manner. We construct a dataset including labels for 19,075 tokens in 10,448 sentences. We train three Chinese BERT models with standard characterlevel masking (CLM), WWM, and a combination of CLM and WWM, respectively. Our major findings are as follows: First, when one character needs to be inserted or replaced, the model trained with CLM performs the best. Second, when more than one character needs to be handled, WWM is the key to better performance. Finally, when being fine-tuned on sentence-level downstream tasks, models trained with different masking strategies perform comparably. * Work done during internship at Tencent AI Lab. * indicates equal contributions.† Corresponding author. 1 Next sentence prediction is the other pretraining task adopted in the original BERT paper. However, it is removed in some following works like RoBERTa . We do not consider the next sentence prediction in this work.

show abstract

A Multi-dimension and Multi-granularity Feature Fusion Method for Chinese Microblog Sentiment Classification

Wei,

Liu,

et al. 2023

Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing

View full text Add to dashboard Cite

Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models

Cited by 4 publications

References 23 publications

MarkBERT: Marking Word Boundaries Improves Chinese BERT

MarkBERT: Marking Word Boundaries Improves Chinese BERT

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

A Multi-dimension and Multi-granularity Feature Fusion Method for Chinese Microblog Sentiment Classification

Contact Info

Product

Resources

About