2019
DOI: 10.48550/arxiv.1911.03863
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
42
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(42 citation statements)
references
References 0 publications
0
42
0
Order By: Relevance
“…The present analyses, when taken in conjunction with our main results in §4.1, suggest that fine-tuning on large training datasets with complex classifiers in the pursuit of state-of-the-art results has mostly nullified the impact of word order in the pre-trained representations. Few shot (Bansal et al, 2019) and few sample (Zhang et al, 2021) learning and evaluation could potentially require more word order signal, and thereby encouraging the model to leverage its own learned syntax better.…”
Section: The Usefulness Of Word Ordermentioning
confidence: 99%
“…The present analyses, when taken in conjunction with our main results in §4.1, suggest that fine-tuning on large training datasets with complex classifiers in the pursuit of state-of-the-art results has mostly nullified the impact of word order in the pre-trained representations. Few shot (Bansal et al, 2019) and few sample (Zhang et al, 2021) learning and evaluation could potentially require more word order signal, and thereby encouraging the model to leverage its own learned syntax better.…”
Section: The Usefulness Of Word Ordermentioning
confidence: 99%
“…We also include the SOTA results from [Bansal et al, 2020] for comparison and note that PACMAML is consistently the best performer over all three few-shot settings k = 4, 8, 16. In comparison, MAML and BMAML perform worse, possibly due to sensitivity to learning rates, as suggested by [Bansal et al, 2019]. Beyond generalization errors, in Table 2 (bottom) we also compare the memory usage of MAML/BMAML against PACMAML over different adaptive layer thresholds v. These results emphasize the computational advantage of PACMAML by showing that as more layers are adapted (lower v), MAML consumes significantly more memory due to its high-order derivatives.…”
Section: Few-shot Classification Problemsmentioning
confidence: 78%
“…Natural Language Inference Lastly, we evaluate the meta-learning algorithms on the large-scale BERT-base [Devlin et al, 2019] model containing 110M parameters. Our experiment involves 12 practical natural language inference tasks from [Bansal et al, 2019] Following [Bansal et al, 2019], we used the pretrained BERT-base model as our base model (hyperprior), and used GLUE benchmark tasks [Wang et al, 2018] for meta-training the models and meta-validation for hyperparameter search, before fine-tuning them for the 12 target tasks. The finetuning data contains k ∈ {4, 8, 16}-shot data for each class in each task.…”
Section: Few-shot Classification Problemsmentioning
confidence: 99%
See 2 more Smart Citations