2023
DOI: 10.48550/arxiv.2302.10447
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Mask-guided BERT for Few Shot Text Classification

Abstract: Transformer-based language models have achieved significant success in various domains. However, the dataintensive nature of the transformer architecture requires much labeled data, which is challenging in low-resource scenarios (i.e., few-shot learning (FSL)). The main challenge of FSL is the difficulty of training robust models on small amounts of samples, which frequently leads to overfitting. Here we present Mask-BERT, a simple and modular framework to help BERT-based architectures tackle FSL. The proposed… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 44 publications
0
3
0
Order By: Relevance
“…Datasets used to train language models can be substantial (1,29), often reaching hundreds of gigabytes, and they draw from various sources and domains (30)(31)(32)(33)(34). Consequently, even when trained on public data, these datasets can contain sensitive information, such as personally identifiable information (PII) including names, phone numbers, and addresses.…”
Section: The Emperor's New Clothes: Privacy Concernsmentioning
confidence: 99%
“…Datasets used to train language models can be substantial (1,29), often reaching hundreds of gigabytes, and they draw from various sources and domains (30)(31)(32)(33)(34). Consequently, even when trained on public data, these datasets can contain sensitive information, such as personally identifiable information (PII) including names, phone numbers, and addresses.…”
Section: The Emperor's New Clothes: Privacy Concernsmentioning
confidence: 99%
“…Promptbased methods guide large language models to predict correct results by designing templates [8], [9], [10], [11]. Model design methods guide the model to learn from few-shot samples by changing the structure of the model [78]. Data augmentation uses similar characters [22], similar word semantics [30], [31], or knowledge base [54], [79] to expand samples.…”
Section: Few-shot Text Classificationmentioning
confidence: 99%
“…Over time, the scale of these models has grown exponentially. Early examples of language models include BERT [2], T5 [3], GPT-1 [4], GPT-2 [5] and various BERT variants [6,7]. In addition, there exists a multitude of domain-specific BERT variants that are tailored to optimize performance in distinct fields of study or industry [8][9][10].…”
Section: Introductionmentioning
confidence: 99%