2019
DOI: 10.1609/aaai.v33i01.33014822
|View full text |Cite
|
Sign up to set email alerts
|

Latent Multi-Task Architecture Learning

Abstract: Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
165
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 250 publications
(177 citation statements)
references
References 7 publications
0
165
0
Order By: Relevance
“…Additionally, with MTL comes a natural urge to simplify the models at hand and group the tasks that would benefit each other's learning process. With this mutually beneficial task relationship in mind, there are numerous domains and modalities [10,14,15,24,37,39,47,51] where the MTL methodology can be applied. As such, MTL is often used implicitly without a specific reference in methods such as transfer learning and fine-tuning [4,40] as well.…”
Section: Related Workmentioning
confidence: 99%
“…Additionally, with MTL comes a natural urge to simplify the models at hand and group the tasks that would benefit each other's learning process. With this mutually beneficial task relationship in mind, there are numerous domains and modalities [10,14,15,24,37,39,47,51] where the MTL methodology can be applied. As such, MTL is often used implicitly without a specific reference in methods such as transfer learning and fine-tuning [4,40] as well.…”
Section: Related Workmentioning
confidence: 99%
“…3) We further propose an NDDR-CNN-Shortcut model, which further uses hierarchical features from different CNN levels for better training and convergence. Similarly, our network also takes the sluice network [40] as a special case: the sluice network predefines a fixed number of subspaces to fuse the features from different tasks between different subspaces (each contains multiple feature channels), while our model can automatically fuse the features according to each single channel.…”
Section: Relationship To State-of-the-art Methodsmentioning
confidence: 99%
“…We also investigate the performances of the cross-stitch network [31] and the state-ofthe-art sluice network [40] for comparison, in which we apply the same number of cross-stitch/sluice layers at the same locations as our NDDR layers. We use the number of subspaces as 2 for sluice network as suggested in [40]. For the fair comparison, we use the best hyperparameters in [31] and [40] to train the corresponding networks 3 .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Our method falls into first category, and differentiates itself by performing "hard' partitioning of task-specific and shared features. By contrast, prior methods are based on "soft" sharing of features [4,12] or weights [19,13]. These methods generally learn a set of mixing coefficients that determine the weighted sum of features throughout the network, which does not impose connectivity structures on the architecture.…”
Section: Related Workmentioning
confidence: 99%