Multi-task Learning Using Multi-modal Encoder-Decoder Networks with Shared Skip Connections

Kuga, Ryohei; Kanezaki, Asako; Samejima, Masaki; Sugano, Yusuke; Matsushita, Yasuyuki

doi:10.1109/iccvw.2017.54

Cited by 29 publications

(17 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some works [40,51,18,26] explored simultaneously learning the depth estimation and the scene parsing tasks. For instance, Wang et al [51] introduced an approach to model the two tasks within a hierarchical CRF, while the CRF model is not jointly learned with the CNN.…”

Section: Related Workmentioning

confidence: 99%

PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

Ouyang

Wang

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

421

289

View full text Add to dashboard Cite

Depth estimation and scene parsing are two particularly important tasks in visual scene understanding. In this paper we tackle the problem of simultaneous depth estimation and scene parsing in a joint CNN. The task can be typically treated as a deep multi-task learning problem [42]. Different from previous methods directly optimizing multiple tasks given the input training data, this paper proposes a novel multi-task guided prediction-and-distillation network (PAD-Net), which first predicts a set of intermediate auxiliary tasks ranging from low level to high level, and then the predictions from these intermediate auxiliary tasks are utilized as multi-modal input via our proposed multi-modal distillation modules for the final tasks. During the joint learning, the intermediate tasks not only act as supervision for learning more robust deep representations but also provide rich multi-modal information for improving the final tasks. Extensive experiments are conducted on two challenging datasets (i.e. NYUD-v2 and Cityscapes) for both the depth estimation and scene parsing tasks, demonstrating the effectiveness of the proposed approach.

show abstract

Section: Related Workmentioning

confidence: 99%

PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

Ouyang

Wang

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

421

289

View full text Add to dashboard Cite

show abstract

“…Multi-task learning [11,8] shown to improve the performance of different tasks with auxiliary objective functions. We explore an unsupervised reconstruction task that seeks to reproduce the sequential US slices to aid the weak supervision of the segmentation task.…”

Section: Related Workmentioning

confidence: 99%

Spatio-Temporal Consistency and Negative Label Transfer for 3D Freehand US Segmentation

Duque

Chanti

Crouzier

et al. 2020

Medical Image Computing and Computer Assisted Intervention – MICCAI 2020

View full text Add to dashboard Cite

The manual segmentation of multiple organs in 3D ultrasound (US) sequences and volumes towards their quantitative analysis is very expensive and time-consuming. Fully supervised segmentation methods still require the collection of large volumes of annotated data while unlabeled images are abundant. In this work, we propose a semi-automatic deep learning approach modeled as a weak-label learning problem: given a few 2-D incomplete annotations for selected slices, the goal is to propagate the masks to the entire sequence. To this end, we make use of both positive and negative constraints induced by incomplete labels to penalize the segmentation loss function. Our model is composed of one encoder and two decoders to model the segmentation and an auxiliary reconstruction task. Moreover, we consider the spatiotemporal information by deploying a Convolutional Long Short Term Memory module. Our findings suggest that the reconstruction decoder and the spatio-temporal information lead to a better geometrical estimation of the mask shape. We apply the model to the task of low-limb muscle segmentation in a dataset of 44 patients and 6160 images.

show abstract

“…We have outlined above new approaches in the digital humanities that are enabled by the dataset and benchmark evaluation tasks. Multitask learning systems that learn on multimodal data are also an active area of research in relation to multimodal representation learning, location estimation, and scene understanding [5,28]. MLM is further designed to evaluate the ability for multitask systems to leverage relationships between constituent entities in data and knowledge graph properties used in the generation process.…”

Section: Impactmentioning

confidence: 99%

MLM

Armitage

Kacupaj

Tahmasebzadeh

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

In this paper, we introduce the MLM (Multiple Languages and Modalities) dataset-a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities. The dataset is designed for researchers and developers who build applications that perform multiple tasks on data encountered on the web and in digital archives. A second version of MLM provides a geo-representative subset of the data with weighted samples for countries of the European Union. We demonstrate the value of the resource in developing novel applications in the digital humanities with a motivating use case and specify a benchmark set of tasks to retrieve modalities and locate entities in the dataset. Evaluation of baseline multitask and single task systems on the full and geo-representative versions of MLM demonstrate the challenges of generalising on diverse data. In addition to the digital humanities, we expect the resource to contribute to research in multimodal representation learning, location estimation, and scene understanding. CCS CONCEPTS • Machine learning; • Multitask learning; • Multimodal data; • Multilingual data; * Denotes equal contribution to this research.

show abstract

Multi-task Learning Using Multi-modal Encoder-Decoder Networks with Shared Skip Connections

Cited by 29 publications

References 14 publications

PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

PAD-Net: Multi-tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

Spatio-Temporal Consistency and Negative Label Transfer for 3D Freehand US Segmentation

MLM

Contact Info

Product

Resources

About