2020
DOI: 10.48550/arxiv.2011.03037
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Teaching with Commentaries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…FT Hypergradient: For the gradient through FT, we acknowledge that differentiating through only one step could, in theory, produce biased hypergradients. However, several prior works on meta-learning various structures similar to what we consider [51,54,42,49,29] did not observe significant bias. Therefore, from an empirical standpoint, this bias is not necessarily expected to be a significant issue.…”
Section: A Notation and Acronymsmentioning
confidence: 66%
See 1 more Smart Citation
“…FT Hypergradient: For the gradient through FT, we acknowledge that differentiating through only one step could, in theory, produce biased hypergradients. However, several prior works on meta-learning various structures similar to what we consider [51,54,42,49,29] did not observe significant bias. Therefore, from an empirical standpoint, this bias is not necessarily expected to be a significant issue.…”
Section: A Notation and Acronymsmentioning
confidence: 66%
“…Applications of Nested Optimization: Many prior works frame learning as nested optimization, including few-shot learning [16,1,17,55,21,58,53,75,31,38], neural network teaching [14,15,62,54], learning data augmentation and reweighting strategies [32,22,57,60,29], and auxiliary task learning [49,51,39]. The majority of this work studies nested optimization in the standard one-stage supervised learning paradigm, unlike our setting: the two-stage PT & FT problem.…”
Section: Related Workmentioning
confidence: 99%
“…Auxiliary losses are pooled over all examples and training epochs and their effect is only known at validation/test time. We would need to use implicit gradients [29,35] to know their eventual effect on the final weights at the end of training. With tailoring, we can directly measure the effect of the meta-learned update on the same sample.…”
Section: Noether Networkmentioning
confidence: 99%
“…A growing number of applications require learning in games, which generalize single-objective optimization. Common examples are GANs (Goodfellow et al, 2014), actor-critic models (Pfau & Vinyals, 2016), curriculum learning (Baker et al, 2019;Balduzzi et al, 2019;Sukhbaatar et al, 2018), hyperparameter optimization (Lorraine & Duvenaud, 2018;Lorraine et al, 2020;MacKay et al, 2019;Raghu et al, 2020), adversarial examples (Bose et al, 2020;Yuan et al, 2019), learning models (Rajeswaran et al, 2020;Abachi et al, 2020;Bacon et al, 2019), domain adversarial adaptation (Acuna et al, 2021), neural architecture search (Grathwohl et al, 2018;Adam & Lorraine, 2019), and meta-learning (Ren et al, 2018;.…”
Section: Introductionmentioning
confidence: 99%