Supermasks in Superposition

Wortsman, Mitchell; Ramanujan, Vivek; Liu, Rosanne; Kembhavi, Aniruddha; Rastegari, Mohammad; Yosinski, Jason; Farhadi, Ali

doi:10.48550/arxiv.2006.14769

Cited by 8 publications

(18 citation statements)

References 29 publications

(85 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Transfer and cross-domain learning Transfer learning as a field is quite varied. Methods are variously classified under few-shot [24,44], continual learning [34,47], and lifelong learning [27,38]. In general, transfer learning methods seek to use knowledge learned from one domain in another to improve performance [33].…”

Section: Related Workmentioning

confidence: 99%

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Ramanujan¹,

Vasu²,

Farhadi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

In visual retrieval systems, updating the embedding model requires recomputing features for every piece of data. This expensive process is referred to as backfilling. Recently, the idea of backward compatible training (BCT) was proposed. To avoid the cost of backfilling, BCT modifies training of the new model to make its representations compatible with those of the old model. However, BCT can significantly hinder the performance of the new model. In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT). In FCT, when the old model is trained, we also prepare for a future unknown version of the model. We propose learning sideinformation, an auxiliary feature for each sample which facilitates future updates of the model. To develop a powerful and flexible framework for model compatibility, we combine side-information with a forward transformation from old to new embeddings. Training of the new model is not modified, hence, its accuracy is not degraded. We demonstrate significant retrieval accuracy improvement compared to BCT for various datasets: ImageNet-1k (+18.1%), Places-365 (+5.4%), and VGG-Face2 (+8.3%). FCT obtains model compatibility when the new and old models are trained across different datasets, losses, and architectures.

show abstract

Section: Related Workmentioning

confidence: 99%

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Ramanujan¹,

Vasu²,

Farhadi³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Parametric isolation based methods adaptively introduce new parameters for new tasks to avoid the parameters of previous tasks being drastically changed [32], [33], [34], [35], [36]. For instance, progressive network [32] allocates a new sub-network for each new task and block any modification on the previously learned networks.…”

Section: Continual Learningmentioning

confidence: 99%

“…Yoon et al [33] proposed a more flexible model (DEN) that dynamically adds new neurons to accommodate new tasks. Recently, various innovative approaches to allocate separated parameters for different tasks have been developed [34], [35], [36]. Besides the concrete models, Knoblauch et al [37] analyzed the required capability of an optimal continual learning agent.…”

Section: Continual Learningmentioning

confidence: 99%

“…In this way, the number of examples contained in different classes of each task are arranged to be as balanced as possible. Specifically, the class indices of each task are: { (35,12), (15,21), (28,30), (16,24), (10,34), (8,4), (5,2), (27,26), (36,19), (23,31), (9,37), (13,3), (20,39), (22,6), (38,33), (25,11), (18,1), (14,7), (0, 17), (29,32)}, where each tuple denotes a task consisting of two classes.…”

Section: Datasetsmentioning

confidence: 99%

“…In our experiments, we select 46 classes and omit the last class containing only 1 example. Similar to OGB-Arxiv, we reorder the classes and the class indices of each tasks are: {(4, 7), (6, 3), (12, 2), (0, 8), (1,13), (16,21), (9,10), (18,24), (17,5), (11,42), (15,20), (19,23), (14,25), (28,29), (43,22), (36,44), (26,37), (32,31), (30,27), (34,38), (41,35), (39,33), (45,40)}.…”

Section: Actor Co-occurrence Networkmentioning

confidence: 99%

See 2 more Smart Citations

Hierarchical Prototype Networks for Continual Graph Representation Learning

Zhang¹,

Song²,

Tao³

2021

Preprint

View full text Add to dashboard Cite

Despite significant advances in graph representation learning, little attention has been paid to the more practical continual learning scenario in which new categories of nodes (e.g., new research areas in citation networks, or new types of products in co-purchasing networks) and their associated edges are continuously emerging, causing catastrophic forgetting on previous categories. Existing methods either ignore the rich topological information or sacrifice plasticity for stability. To this end, we present Hierarchical Prototype Networks (HPNs) which extract different levels of abstract knowledge in the form of prototypes to represent the continuously expanded graphs. Specifically, we first leverage a set of Atomic Feature Extractors (AFEs) to encode both the elemental attribute information and the topological structure of the target node. Next, we develop HPNs to adaptively select relevant AFEs and represent each node with three levels of prototypes. In this way, whenever a new category of nodes is given, only the relevant AFEs and prototypes at each level will be activated and refined, while others remain uninterrupted to maintain the performance over existing nodes. Theoretically, we first demonstrate that the memory consumption of HPNs is bounded regardless of how many tasks are encountered. Then, we prove that under mild constraints, learning new tasks will not alter the prototypes matched to previous data, thereby eliminating the forgetting problem. The theoretical results are supported by experiments on five datasets, showing that HPNs not only outperform state-of-the-art baseline techniques but also consume relatively less memory.

show abstract

Continual Learning for Multivariate Time Series Tasks with Variable Input Dimensions

Gupta¹,

Narwariya²,

Malhotra³

et al. 2021

2021 IEEE International Conference on Data Mining (ICDM)

View full text Add to dashboard Cite

We consider a sequence of related multivariate time series learning tasks, such as predicting failures for different instances of a machine from time series of multi-sensor data, or activity recognition tasks over different individuals from multiple wearable sensors. We focus on two under-explored practical challenges arising in such settings: (i) Each task may have a different subset of sensors, i.e., providing different partial observations of the underlying 'system'. This restriction can be due to different manufacturers in the former case, and people wearing more or less measurement devices in the latter (ii) We are not allowed to store or re-access data from a task once it has been observed at the task level. This may be due to privacy considerations in the case of people, or legal restrictions placed by machine owners. Nevertheless, we would like to (a) improve performance on subsequent tasks using experience from completed tasks as well as (b) continue to perform better on past tasks, e.g., update the model and improve predictions on even the first machine after learning from subsequently observed ones. We note that existing continual learning methods do not take into account variability in input dimensions arising due to different subsets of sensors being available across tasks, and struggle to adapt to such variable input dimensions (VID) tasks. In this work, we address this shortcoming of existing methods. To this end, we learn task-specific generative models and classifiers, and use these to augment data for target tasks. Since the input dimensions across tasks vary, we propose a novel conditioning module based on graph neural networks to aid a standard recurrent neural network. We evaluate the efficacy of the proposed approach on three publicly available datasets corresponding to two activity recognition tasks (classification) and one prognostics task (regression). We demonstrate that it is possible to significantly enhance the performance on future and previous tasks while learning continuously from VID tasks without storing data.

show abstract

Supermasks in Superposition

Cited by 8 publications

References 29 publications

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Forward Compatible Training for Large-Scale Embedding Retrieval Systems

Hierarchical Prototype Networks for Continual Graph Representation Learning

Continual Learning for Multivariate Time Series Tasks with Variable Input Dimensions

Contact Info

Product

Resources

About