2020
DOI: 10.48550/arxiv.2006.14769
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Supermasks in Superposition

Abstract: We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned sup… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
17
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(18 citation statements)
references
References 29 publications
(85 reference statements)
1
17
0
Order By: Relevance
“…Transfer and cross-domain learning Transfer learning as a field is quite varied. Methods are variously classified under few-shot [24,44], continual learning [34,47], and lifelong learning [27,38]. In general, transfer learning methods seek to use knowledge learned from one domain in another to improve performance [33].…”
Section: Related Workmentioning
confidence: 99%
“…Transfer and cross-domain learning Transfer learning as a field is quite varied. Methods are variously classified under few-shot [24,44], continual learning [34,47], and lifelong learning [27,38]. In general, transfer learning methods seek to use knowledge learned from one domain in another to improve performance [33].…”
Section: Related Workmentioning
confidence: 99%
“…Parametric isolation based methods adaptively introduce new parameters for new tasks to avoid the parameters of previous tasks being drastically changed [32], [33], [34], [35], [36]. For instance, progressive network [32] allocates a new sub-network for each new task and block any modification on the previously learned networks.…”
Section: Continual Learningmentioning
confidence: 99%
“…Yoon et al [33] proposed a more flexible model (DEN) that dynamically adds new neurons to accommodate new tasks. Recently, various innovative approaches to allocate separated parameters for different tasks have been developed [34], [35], [36]. Besides the concrete models, Knoblauch et al [37] analyzed the required capability of an optimal continual learning agent.…”
Section: Continual Learningmentioning
confidence: 99%
See 2 more Smart Citations