2021
DOI: 10.1109/tpami.2020.2983686
|View full text |Cite
|
Sign up to set email alerts
|

Deep High-Resolution Representation Learning for Visual Recognition

Abstract: High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
1,263
0
2

Year Published

2022
2022
2023
2023

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 2,163 publications
(1,267 citation statements)
references
References 135 publications
2
1,263
0
2
Order By: Relevance
“…Object detection has attracted a great deal of attention in recent years [4,13,14,16,19,20,27,28,30,38,39,43,47,48,56]. One popular direction for recent object detection is proposal-based object detectors (a.k.a.…”
Section: Related Workmentioning
confidence: 99%
“…Object detection has attracted a great deal of attention in recent years [4,13,14,16,19,20,27,28,30,38,39,43,47,48,56]. One popular direction for recent object detection is proposal-based object detectors (a.k.a.…”
Section: Related Workmentioning
confidence: 99%
“…The network depth is of crucial importance for challenging image classification problems [30]. Many deep neural network architectures (which use hundreds of layers) such as ResNet [31], ResNext [57] or HRnet [58], have provided an outstanding performance in varied image datasets with multitude of different objects [59]. These complex architectures are usually exploited in a pre-trained fashion [60], which saves computational efforts and allows different domains to take advantage of their prediction capabilities when the scarcity of annotated examples invalidates the training of such models from scratch [61]- [63].…”
Section: ) Convolutional Neural Networkmentioning
confidence: 99%
“…Inspired by High-Resolution Network [22], we develop a carefully modified HRNet containing three stages as the shared backbone network, which can be end-to-end trained. The input is first fed into a stem consisting of two 3 × 3 convolutions with stride 2 for resolution reduction, and subsequently transmitted the main body that includes parallel multi-branch convolutions with different resolutions.…”
Section: High-resolution Networkmentioning
confidence: 99%
“…C in each residual unit is the number of channels. Following the design of HRNet [22], we gradually append high-to low-resolution streams, forming the new stage consisting of the previous resolution and an extra lower one, and connect the multi-resolution branches in parallel. The advantage is that the resulting representation is more precise spatial location and richer semantic information.…”
Section: High-resolution Networkmentioning
confidence: 99%
See 1 more Smart Citation