Latent Multi-Task Architecture Learning

Ruder, Sebastian; Bingel, Joachim; Augenstein, Isabelle; Søgaard, Anders

doi:10.1609/aaai.v33i01.33014822

Cited by 250 publications

(177 citation statements)

References 7 publications

Supporting

Mentioning

165

Contrasting

Order By: Relevance

“…Additionally, with MTL comes a natural urge to simplify the models at hand and group the tasks that would benefit each other's learning process. With this mutually beneficial task relationship in mind, there are numerous domains and modalities [10,14,15,24,37,39,47,51] where the MTL methodology can be applied. As such, MTL is often used implicitly without a specific reference in methods such as transfer learning and fine-tuning [4,40] as well.…”

Section: Related Workmentioning

confidence: 99%

Learning Task Relatedness in Multi-Task Learning for Images in Context

Strezoski

Noord

Worring

2019

Proceedings of the 2019 on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Multimedia applications often require concurrent solutions to multiple tasks. These tasks hold clues to each-others solutions, however as these relations can be complex this remains a rarely utilized property. When task relations are explicitly defined based on domain knowledge multi-task learning (MTL) offers such concurrent solutions, while exploiting relatedness between multiple tasks performed over the same dataset. In most cases however, this relatedness is not explicitly defined and the domain expert knowledge that defines it is not available. To address this issue, we introduce Selective Sharing, a method that learns the inter-task relatedness from secondary latent features while the model trains. Using this insight, we can automatically group tasks and allow them to share knowledge in a mutually beneficial way. We support our method with experiments on 5 datasets in classification, regression, and ranking tasks and compare to strong baselines and state-of-the-art approaches showing a consistent improvement in terms of accuracy and parameter counts. In addition, we perform an activation region analysis showing how Selective Sharing affects the learned representation.

show abstract

Section: Related Workmentioning

confidence: 99%

Learning Task Relatedness in Multi-Task Learning for Images in Context

Strezoski

Noord

Worring

2019

Proceedings of the 2019 on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…3) We further propose an NDDR-CNN-Shortcut model, which further uses hierarchical features from different CNN levels for better training and convergence. Similarly, our network also takes the sluice network [40] as a special case: the sluice network predefines a fixed number of subspaces to fuse the features from different tasks between different subspaces (each contains multiple feature channels), while our model can automatically fuse the features according to each single channel.…”

Section: Relationship To State-of-the-art Methodsmentioning

confidence: 99%

“…We also investigate the performances of the cross-stitch network [31] and the state-ofthe-art sluice network [40] for comparison, in which we apply the same number of cross-stitch/sluice layers at the same locations as our NDDR layers. We use the number of subspaces as 2 for sluice network as suggested in [40]. For the fair comparison, we use the best hyperparameters in [31] and [40] to train the corresponding networks 3 .…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction

Gao

Zhao

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

203

126

View full text Add to dashboard Cite

In this paper, we propose a novel Convolutional Neural Network (CNN) structure for general-purpose multi-task learning (MTL), which enables automatic feature fusing at every layer from different tasks. This is in contrast with the most widely used MTL CNN structures which empirically or heuristically share features on some specific layers (e.g., share all the features except the last convolutional layer). The proposed layerwise feature fusing scheme is formulated by combining existing CNN components in a novel way, with clear mathematical interpretability as discriminative dimensionality reduction, which is referred to as Neural Discriminative Dimensionality Reduction (NDDR). Specifically, we first concatenate features with the same spatial resolution from different tasks according to their channel dimension. Then, we show that the discriminative dimensionality reduction can be fulfilled by 1 × 1 Convolution, Batch Normalization, and Weight Decay in one CNN. The use of existing CNN components ensures the end-to-end training and the extensibility of the proposed NDDR layer to various state-of-the-art CNN architectures in a "plug-andplay" manner. The detailed ablation analysis shows that the proposed NDDR layer is easy to train and also robust to different hyperparameters. Experiments on different task sets with various base network architectures demonstrate the promising performance and desirable generalizability of our proposed method. The code of our paper is available at https

show abstract

“…Our method falls into first category, and differentiates itself by performing "hard' partitioning of task-specific and shared features. By contrast, prior methods are based on "soft" sharing of features [4,12] or weights [19,13]. These methods generally learn a set of mixing coefficients that determine the weighted sum of features throughout the network, which does not impose connectivity structures on the architecture.…”

Section: Related Workmentioning

confidence: 99%

Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels

Bragman

Tanno

Ourselin

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

The performance of multi-task learning in Convolutional Neural Networks (CNNs) hinges on the design of feature sharing between tasks within the architecture. The number of possible sharing patterns are combinatorial in the depth of the network and the number of tasks, and thus hand-crafting an architecture, purely based on the human intuitions of task relationships can be time-consuming and suboptimal. In this paper, we present a probabilistic approach to learning task-specific and shared representations in CNNs for multi-task learning. Specifically, we propose "stochastic filter groups" (SFG), a mechanism to assign convolution kernels in each layer to "specialist" or "generalist" groups, which are specific to or shared across different tasks, respectively. The SFG modules determine the connectivity between layers and the structures of task-specific and shared representations in the network. We employ variational inference to learn the posterior distribution over the possible grouping of kernels and network parameters. Experiments demonstrate that the proposed method generalises across multiple tasks and shows improved performance over baseline methods.

show abstract

Latent Multi-Task Architecture Learning

Cited by 250 publications

References 7 publications

Learning Task Relatedness in Multi-Task Learning for Images in Context

Learning Task Relatedness in Multi-Task Learning for Images in Context

NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction

Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels

Contact Info

Product

Resources

About