2019
DOI: 10.48550/arxiv.1904.09223
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ERNIE: Enhanced Representation through Knowledge Integration

Abstract: We present a novel language representation model enhanced by knowledge called ERNIE (Enhanced Representation through kNowledge IntEgration). Inspired by the masking strategy of BERT (Devlin et al., 2018), ERNIE is designed to learn language representation enhanced by knowledge masking strategies, which includes entity-level masking and phrase-level masking. Entity-level strategy masks entities which are usually composed of multiple words. Phrase-level strategy masks the whole phrase which is composed of severa… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
253
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 337 publications
(290 citation statements)
references
References 15 publications
0
253
0
Order By: Relevance
“…Besides, MLM randomly masks out some independent words, which are the smallest semantic units in English but may not have complete semantics in other languages, such as Chinese. Thus, ERNIE (Baidu) (Sun et al, 2019b) introduces entity-level and phrase-level masking, where multiple words that represent the same semantic meaning are masked. This achieves good transferability on Chinese NLP tasks.…”
Section: Generative Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…Besides, MLM randomly masks out some independent words, which are the smallest semantic units in English but may not have complete semantics in other languages, such as Chinese. Thus, ERNIE (Baidu) (Sun et al, 2019b) introduces entity-level and phrase-level masking, where multiple words that represent the same semantic meaning are masked. This achieves good transferability on Chinese NLP tasks.…”
Section: Generative Learningmentioning
confidence: 99%
“…(Peters et al, 2018) serves as the baseline. GPT (Radford et al, 2018), BERT Large (Devlin et al, 2019), T5 (Raffel et al, 2020), and ERNIE (Sun et al, 2019b) have different architectures. RoBERTa , XLM (Lample and Conneau, 2019), and SpanBERT (Joshi et al, 2020) share the same architecture as BERT Large but employ different pre-training methods.…”
Section: Pre-trainingmentioning
confidence: 99%
“…Contrastively, our work is integrating knowledge from a large MoE model. Sun et al (2019) proposed to integrate knowledge by using knowledge masking strategies. Please note our knowledge integration is different from theirs.…”
Section: Knowledge Integrationmentioning
confidence: 99%
“…Task-aware Language models. A recent line of works has been focused on bridging the gap between the selfsupervision task and the downstream tasks which is inherent to multi-purpose pretrained models (Sun et al 2019;Tian et al 2020;Chang et al 2020). In (Joshi et al 2020), spans of texts are masked rather than single tokens, resulting in a language model oriented to span-selection tasks.…”
Section: Related Workmentioning
confidence: 99%