2019
DOI: 10.48550/arxiv.1909.00204
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NEZHA: Neural Contextualized Representation for Chinese Language Understanding

Abstract: The pre-trained language models have achieved great successes in various natural language understanding (NLU) tasks due to its capacity to capture the deep contextualized information in text by pre-training on large-scale corpora. In this technical report, we present our practice of pre-training language models named NEZHA (NEural contextualiZed representation for CHinese lAnguage understanding) on Chinese corpora and finetuning for the Chinese NLU tasks. The current version of NEZHA is based on BERT [1] with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
26
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 32 publications
(26 citation statements)
references
References 20 publications
0
26
0
Order By: Relevance
“…To capture the sequential features in languages, previous PLMs adopt position embedding in either input representations (Devlin et al, 2019;Lan et al, 2020) or attention weights (Yang et al, 2019;Wei et al, 2019;Ke et al, 2020). For the input-level position embedding, the inputs of the first layer are h in,0 i = h in,0 i + P i , where P i is the embedding of the i th position.…”
Section: Lattice-bertmentioning
confidence: 99%
See 2 more Smart Citations
“…To capture the sequential features in languages, previous PLMs adopt position embedding in either input representations (Devlin et al, 2019;Lan et al, 2020) or attention weights (Yang et al, 2019;Wei et al, 2019;Ke et al, 2020). For the input-level position embedding, the inputs of the first layer are h in,0 i = h in,0 i + P i , where P i is the embedding of the i th position.…”
Section: Lattice-bertmentioning
confidence: 99%
“…From the perspective of reporting strategies, we report the performance of base-size models together with lite-size models. As far as we know, all previous Chinese PLMs only report baseor large-size settings (Wei et al, 2019;Diao et al, 2020;Cui et al, 2020;. Thus, the followers have to implement at least a 12-layer pre-training model to make a fair comparison.…”
Section: A Ethical Considerationsmentioning
confidence: 99%
See 1 more Smart Citation
“…By this way, we can obtain ten different base models. Besides, we also replace the Encoder of different models with different pretrained language models, including BERT, RoBERTa-wwm-ext [7], and NEZHA [15]. Accordingly, another kinds of base models can be trained.…”
Section: Model Enhancement Techniquesmentioning
confidence: 99%
“…The statistics of these datasets are presented in Experimental Settings. Among on these text classification datasets, except for the Chinese dataset iflytek that we use pre-trained model Nezha base(Wei et al 2019) as our baseline, we all use BERT base(Devlin et al 2018) as our baseline model on the other English datasets. According to the text length distribution of different datasets, as well as the maximum word length limited by Bert, we set the hyperparameters as shown in the following table 4.…”
mentioning
confidence: 99%