2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.579
|View full text |Cite
|
Sign up to set email alerts
|

UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory

Abstract: In this work we introduce a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture that is trained end-to-end. Such a universal network can act like a 'swiss knife' for vision tasks; we call this architecture an UberNet to indicate its overarching nature.We address two main technical challenges that emerge when broadening up the range of tasks handled by a single CNN: (i) training a deep architecture while relying on diverse training sets and (… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
410
0
5

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 549 publications
(420 citation statements)
references
References 97 publications
3
410
0
5
Order By: Relevance
“…Finding an objective for such a broad and vague task appears futile so that it is easier to define a subset of tasks like figure ground segmentation, saliency and boundaries. A noteworthy implementation of such a decomposition is the recent DNN 'Uber-Net' (Kokkinos, 2016), which solves 7 vision related tasks (boundary, surface normals, saliency, semantic segmentation, semantic boundary and human parts detection) with a single multi-scale DNN network to reduce the memory footprint. It can be assumed that such a multi-task training improves convergence speed and better generalization to unseen data, something that already has been observed on other multi-task setups related to speech processing, vision and maze navigation (Bilen & Vedaldi, 2016;Caruana, 1998;Dietterich, Hild, & Bakiri, 1990, 1995Mirowski et al, 2016).…”
Section: Box 1 Deep Neural Networkmentioning
confidence: 99%
“…Finding an objective for such a broad and vague task appears futile so that it is easier to define a subset of tasks like figure ground segmentation, saliency and boundaries. A noteworthy implementation of such a decomposition is the recent DNN 'Uber-Net' (Kokkinos, 2016), which solves 7 vision related tasks (boundary, surface normals, saliency, semantic segmentation, semantic boundary and human parts detection) with a single multi-scale DNN network to reduce the memory footprint. It can be assumed that such a multi-task training improves convergence speed and better generalization to unseen data, something that already has been observed on other multi-task setups related to speech processing, vision and maze navigation (Bilen & Vedaldi, 2016;Caruana, 1998;Dietterich, Hild, & Bakiri, 1990, 1995Mirowski et al, 2016).…”
Section: Box 1 Deep Neural Networkmentioning
confidence: 99%
“…In particular, motivated by the success of deep learning to various tasks including object detection, dense semantic segmentation, and normal estimation of scenes [23,2,33,62] etc., we propose to exploit the available large scale facial databases captured both in controlled, as well as in unconstrained conditions [8,48] to train a fully convolutional deep network that maps image pixels to normals. More precisely, to acquire accurate ground truth of facial normals we synthesise images of faces created with the use of recently released Large-Scale 3D Facial Models (LSFM) [8] which contains facial shapes of individuals with diverse ethnicities and characteristics.…”
Section: Oursmentioning
confidence: 99%
“…Discriminative estimation of normals has started to recently received increased attention [2,13,62,33,45,15]. One of the first methods was proposed in [70].…”
Section: Prior Work On Discriminative Surface Normal Estimationmentioning
confidence: 99%
See 2 more Smart Citations