Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Requeima, James; Gordon, Jonathan; Bronskill, John; Nowozin, Sebastian; Turner, Richard E.

doi:10.48550/arxiv.1906.07697

Cited by 7 publications

(27 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Meta-models. Meta-models include methods based on MAML [93], ProtoNets [94] and auxiliary nets predicting task-specific parameters [95][96][97][98]. These methods are tied to a particular architecture and need to be trained from scratch if it is changed.…”

Section: Related Workmentioning

confidence: 99%

“…Denil et al [113] train a model that can predict a fraction of network parameters given other parameters requiring to retrain the model for each new architecture. Bertinetto et al [98] train a model that predicts parameters given a new few-shot task similarly to [18,96], and the model is also tied to a particular architecture. The HyperGAN [114] allows to generate an ensemble of trained parameters in a computationally efficient way, but as the aforementioned works is constrained to a particular architecture.…”

Section: Appendixmentioning

confidence: 99%

See 1 more Smart Citation

Parameter Prediction for Unseen Deep Architectures

Knyazev¹,

Drozdzal²,

Taylor³

et al. 2021

Preprint

View full text Add to dashboard Cite

Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures -DEEPNETS-1M-and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Appendixmentioning

confidence: 99%

Parameter Prediction for Unseen Deep Architectures

Knyazev¹,

Drozdzal²,

Taylor³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Another setting where FiLM layers have been shown effective is few-shot learning. This is the case, for instance, of TADAM (Oreshkin et al, 2018), CNAPs (Requeima et al, 2019), and CAVIA (Zintgraf et al, 2019) where FiLM layers are used to enable adapting a global cross-task model to particular tasks. Moreover, the few-shot classification setting under domain shift is tackled in (Tseng et al, 2020), where feature-wise transformations are used as a means to diversify data and artificially create new domains at training time.…”

Section: Conditional Modelingmentioning

confidence: 99%

Domain Conditional Predictors for Domain Adaptation

Monteiro¹,

Gibert²,

Feng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Learning guarantees often rely on assumptions of i.i.d. data, which will likely be violated in practice once predictors are deployed to perform real-world tasks. Domain adaptation approaches thus appeared as a useful framework yielding extra flexibility in that distinct train and test data distributions are supported, provided that other assumptions are satisfied such as covariate shift, which expects the conditional distributions over labels to be independent of the underlying data distribution. Several approaches were introduced in order to induce generalization across varying train and test data sources, and those often rely on the general idea of domain-invariance, in such a way that the data-generating distributions are to be disregarded by the prediction model. In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution. For instance, the model has an explicit mechanism to adapt to changing environments and/or new data sources. We argue that such an approach is more generally applicable than current domain adaptation methods since it does not require extra assumptions such as covariate shift and further yields simpler training algorithms that avoid a common source of training instabilities caused by minimax formulations, often employed in domain-invariant methods.

show abstract

“…where we have split Z n into context (Z n,c ) and target (Z n,t ) sets. This is standard practice in both the NP (Garnelo et al, 2018a;b) and meta-learning settings (Finn et al, 2017; and relates to neural auto-regressive models (Requeima et al, 2019). Practically, stochastic gradient descent methods (Bottou, 2010) can be used to perform the optimization.…”

Section: Convolutional Conditional Neural Processesmentioning

confidence: 99%

“…A key component of NPs is the embedding of context sets Z into a representation space through an encoder Z → E(Z), which is achieved using a deep set function approximator (Zaheer et al, 2017). This simple model specification allows NPs to be used for (i) meta-learning (Thrun & Pratt, 2012;Schmidhuber, 1987), since predictions can be generated on the fly from new context sets at test time; and (ii) multi-task or transfer learning (Requeima et al, 2019), since they provide a natural way of sharing information between data sets. Moreover, conditional NPs (CNPs; Garnelo et al, 2018a), a deterministic variant of NPs, can be trained in a particularly simple way with maximum likelihood learning of the parameters θ, which mimics how the system is used at test time, resulting in strong performance .…”

Section: Introductionmentioning

confidence: 99%

Convolutional Conditional Neural Processes

Gordon¹,

Bruinsma²,

Foong³

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

We introduce the Convolutional Conditional Neural Process (CONVCNP), a new member of the Neural Process family that models translation equivariance in the data. Translation equivariance is an important inductive bias for many learning problems including time series modelling, spatial data, and images. The model embeds data sets into an infinite-dimensional function space as opposed to a finitedimensional vector space. To formalize this notion, we extend the theory of neural representations of sets to include functional representations, and demonstrate that any translation-equivariant embedding can be represented using a convolutional deep set. We evaluate CONVCNPs in several settings, demonstrating that they achieve state-of-the-art performance compared to existing NPs. We demonstrate that building in translation equivariance enables zero-shot generalization to challenging, out-of-domain tasks.

show abstract

Fast and Flexible Multi-Task Classification Using Conditional Neural Adaptive Processes

Cited by 7 publications

References 0 publications

Parameter Prediction for Unseen Deep Architectures

Parameter Prediction for Unseen Deep Architectures

Domain Conditional Predictors for Domain Adaptation

Convolutional Conditional Neural Processes

Contact Info

Product

Resources

About