AdaMT-Net: An Adaptive Weight Learning Based Multi-Task Learning Model For Scene Understanding

Jha, Anand K.; Kumar, Awanish; Banerjee, Biplab; Chaudhuri, Subhasis

doi:10.1109/cvprw50498.2020.00361

Cited by 15 publications

(8 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As shown in the table, our model outperforms all previous works except for AdaMT-Net [5]. Compared to AdaMT-Net, our model improves mIoU and relative depth error by a fair margin.…”

Section: E Resultsmentioning

confidence: 66%

“…We think that this is due to the difference of using attention modules or not. Previous works such as [4], [5], [8], [29] have used attention modules in their networks, enabling the model to "look" at the entire image in the training phase. On the other hand, since our model only uses convolutional layers, the model can only learn from pixels nearby.…”

Section: E Resultsmentioning

confidence: 99%

“…Kendall et al [6] proposed uncertainty weights, a quantity that can be seen as the relative confidence between tasks. Other methods [5], [7] have used loss magnitude to leverage the losses. On the other hand, Liu et al [4] introduced DWA (dynamic weight averaging) which uses relative loss reduction and Yu et al [8] implemented PCGrad, an algorithm that fixes contradicting gradients in the shared architecture.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Cross-Task Consistency Learning Framework for Multi-Task Learning

Nakano

Shi

Demachi

2021

Preprint

View full text Add to dashboard Cite

Multi-task learning (MTL) is an active field in deep learning in which we train a model to jointly learn multiple tasks by exploiting relationships between the tasks. It has been shown that MTL helps the model share the learned features between tasks and enhance predictions compared to when learning each task independently. We propose a new learning framework for 2-task MTL problem that uses the predictions of one task as inputs to another network to predict the other task. We define two new loss terms inspired by cycle-consistency loss and contrastive learning, alignment loss and cross-task consistency loss. Both losses are designed to enforce the model to align the predictions of multiple tasks so that the model predicts consistently. We theoretically prove that both losses help the model learn more efficiently and that cross-task consistency loss is better in terms of alignment with the straight-forward predictions. Experimental results also show that our proposed model achieves significant performance on the benchmark Cityscapes and NYU dataset.

show abstract

“…As shown in the table, our model outperforms all previous works except for AdaMT-Net [5]. Compared to AdaMT-Net, our model improves mIoU and relative depth error by a fair margin.…”

Section: E Resultsmentioning

confidence: 66%

Section: E Resultsmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Cross-Task Consistency Learning Framework for Multi-Task Learning

Nakano

Shi

Demachi

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…It is also possible to make a distinction according to the level of parameter sharing in a multi-task framework. Particularly, Jha et al [120] distinguishes between soft and hard-sharing. In the first case, models have a separate network for each task under consideration, resulting in a disjoint set of parameters.…”

Section: Depth As Predictionmentioning

confidence: 99%

A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design

Manfio¹,

Osório²

2023

Preprint

View full text Add to dashboard Cite

Semantic image and video segmentation stand among the most important tasks in computer vision nowadays, since they provide a complete and meaningful representation of the environment by means of a dense classification of the pixels in a given scene. Recently, Deep Learning, and more precisely Convolutional Neural Networks, have boosted semantic segmentation to a new level in terms of performance and generalization capabilities. However, designing Deep Semantic Segmentation models is a complex task, as it may involve application-dependent aspects. Particularly, when considering autonomous driving applications, the robustness-efficiency tradeoff, as well as intrinsic limitations -computational/memory bounds and data-scarcity -and constraints -real-time inference -should be taken into consideration. In this respect, the use of additional data modalities, such as depth perception for reasoning on the geometry of a scene, and temporal cues from videos to explore redundancy and consistency, are promising directions yet not explored to their full potential in the literature. In this paper, we conduct a survey on the most relevant and recent advances in Deep Semantic Segmentation in the context of vision for autonomous vehicles, from three different perspectives: efficiency-oriented model development for real-time operation, RGB-Depth data integration (RGB-D semantic segmentation), and the use of temporal information from videos in temporalaware models. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective, so that the reader can not only get started, but also be up to date in respect to recent advances in this exciting and challenging research field.

show abstract

“…Furthermore, learning multiple tasks at once can improve generalization ability and lead to better results compared to single-task performance. A number of works exists, which tackle the different tasks for scene understanding in a multi-task setting [18,28,29,48,55,63,68,74,79,80]. Goel et al [18] propose QuadroNet, a real-time capable model to predict 2D bounding boxes, panoptic segmentation, and depth from single images.…”

Section: Multi-task Learningmentioning

confidence: 99%

MGNet: Monocular Geometric Scene Understanding for Autonomous Driving

Schön

Buchholz

Dietmayer

2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

We introduce MGNet, a multi-task framework for monocular geometric scene understanding. We define monocular geometric scene understanding as the combination of two known tasks: Panoptic segmentation and self-supervised monocular depth estimation. Panoptic segmentation captures the full scene not only semantically, but also on an instance basis. Self-supervised monocular depth estimation uses geometric constraints derived from the camera measurement model in order to measure depth from monocular video sequences only. To the best of our knowledge, we are the first to propose the combination of these two tasks in one single model. Our model is designed with focus on low latency to provide fast inference in real-time on a single consumer-grade GPU. During deployment, our model produces dense 3D point clouds with instance aware semantic labels from single high-resolution camera images. We evaluate our model on two popular autonomous driving benchmarks, i.e., Cityscapes and KITTI, and show competitive performance among other real-time capable methods. Source code is available at https://github. com/markusschoen/MGNet.

show abstract

AdaMT-Net: An Adaptive Weight Learning Based Multi-Task Learning Model For Scene Understanding

Cited by 15 publications

References 13 publications

Cross-Task Consistency Learning Framework for Multi-Task Learning

Cross-Task Consistency Learning Framework for Multi-Task Learning

A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented, Temporal and Depth-aware design

MGNet: Monocular Geometric Scene Understanding for Autonomous Driving

Contact Info

Product

Resources

About