2020
DOI: 10.48550/arxiv.2003.10555
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
477
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 396 publications
(585 citation statements)
references
References 0 publications
5
477
0
Order By: Relevance
“…Future work on acronym extraction can explore model adaptability to other domains and can attempt to capture acronym-long-form interactions better during their extrac-tion. We use the base version of mBERT for all our experiments; larger and specialized models such as RoBERTa (Liu et al 2019), ELECTRA (Clark et al 2020), LegalBERT (Chalkidis et al 2020), etc. can also be tested.…”
Section: Discussionmentioning
confidence: 99%
“…Future work on acronym extraction can explore model adaptability to other domains and can attempt to capture acronym-long-form interactions better during their extrac-tion. We use the base version of mBERT for all our experiments; larger and specialized models such as RoBERTa (Liu et al 2019), ELECTRA (Clark et al 2020), LegalBERT (Chalkidis et al 2020), etc. can also be tested.…”
Section: Discussionmentioning
confidence: 99%
“…Specifically, we replace 25% of the tokens in the anchor with random tokens sampled from the vocabulary or a special token such as [MASK]. Such token replacements have been hugely successful in masked language modeling [11,38] and forces the model to distinguish the tokens based on the context, thereby avoiding overfitting. The irrelevant document ψ − i is computed through hard negative mining [39] i.e.…”
Section: Document Pairs Constructionmentioning
confidence: 99%
“…CodeBERT also employed Masked Language Modeling (MLM) [18] and Replaced Token Detection (RTD) [36] during pre-training, allowing to take tokens from random positions and masking them with special tokens, which are later used to predict the original tokens. As a result, each token is assigned a vector representation containing information about the token and its position in a given code.…”
Section: Codebertmentioning
confidence: 99%