Findings of the Association for Computational Linguistics: EMNLP 2022 2022
DOI: 10.18653/v1/2022.findings-emnlp.106
|View full text |Cite
|
Sign up to set email alerts
|

Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…A similar approach has been explored by masked autoencoders in vision , where 75% of the input patches are masked and removed from the input of the heavy encoder to achieve a 4.1× speedup. Recently, Liao et al (2022) have applied these architectural improvements to natural language pre-training, and together with a high masking rate can accelerate MLM by a third of the pre-training budget.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…A similar approach has been explored by masked autoencoders in vision , where 75% of the input patches are masked and removed from the input of the heavy encoder to achieve a 4.1× speedup. Recently, Liao et al (2022) have applied these architectural improvements to natural language pre-training, and together with a high masking rate can accelerate MLM by a third of the pre-training budget.…”
Section: Conclusion and Discussionmentioning
confidence: 99%