2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.37
|View full text |Cite
|
Sign up to set email alerts
|

Exploit Bounding Box Annotations for Multi-Label Object Recognition

Abstract: Convolutional neural networks (CNNs) have shown great performance as general feature representations for object recognition applications. However, for multi-label images that contain multiple objects from different categories, scales and locations, global CNN features are not optimal. In this paper, we incorporate local information to enhance the feature discriminative power. In particular, we first extract object proposals from each image. With each image treated as a bag and object proposals extracted from i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
110
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 156 publications
(110 citation statements)
references
References 22 publications
(47 reference statements)
0
110
0
Order By: Relevance
“…Recent progress on multi-label image classification relies on the combination of object localization and deep learning techniques [28,30]. Generally, they introduced object proposals [35] that were assumed to contain all possible foreground objects in the image and aggregated features extracted from all these proposals to incorporate local information.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Recent progress on multi-label image classification relies on the combination of object localization and deep learning techniques [28,30]. Generally, they introduced object proposals [35] that were assumed to contain all possible foreground objects in the image and aggregated features extracted from all these proposals to incorporate local information.…”
Section: Related Workmentioning
confidence: 99%
“…Current methods for multi-label image classification usually employ object localization techniques [28,30] or resort to visual attention networks [34] to locate semantic object regions. However, object localization techniques [23,35] have to search numerous category-agnostic and redundant proposals and can hardly be integrated into deep neural networks for end-to-end training, while visual attention networks can merely locate object regions roughly due to the lack of supervision or guidance.…”
Section: Introductionmentioning
confidence: 99%
“…All of the aforementioned methods consider extracting the features of the whole image with no spatial information, which on one hand were unable to explicitly perceive the corresponding image regions to the detected classification labels, and on the other hand, were extremely vulnerable to the complex background. To overcome this issue, some researchers propose to exploit object proposals to only focus on the informative regions, which effectively eliminate the influences of the non-object areas and thus demonstrate significant improvement in multi-label image recognition task [30,27]. More specifically, Wei et al [27] propose a Hypotheses-CNN-Pooling framework to aggregate the label scores of each specific object hypotheses to achieve the final multi-label predictions.…”
Section: Multi-label Image Recognitionmentioning
confidence: 99%
“…More specifically, Wei et al [27] propose a Hypotheses-CNN-Pooling framework to aggregate the label scores of each specific object hypotheses to achieve the final multi-label predictions. Yang et al [30] formulate the multi-label image recognition problem as a multi-class multi-instance learning problem to incorporate local information and enhance the discriminative ability of the features by encoding the label view information. However, these object proposals based methods are generally not efficient with the preprocessing step of object proposal generation being the bottleneck.…”
Section: Multi-label Image Recognitionmentioning
confidence: 99%
“…Therefore, we aim to utilize and incorporate these two essential information in our inpainting process. In order to effectively determine these two types of information for input images, we can pretrain state-of-the-art multi-label image classification models [20] [21] [22] [23] [24] and image semantic segmentation models [25] [26] [27] [28] [29] on auxiliary labeled datasets and incorporate them in image inpainting process. The auxiliary labeled datasets do not need to contain images that are exactly the same as those to be inpainted, as long as they are in similar categories.…”
Section: Introductionmentioning
confidence: 99%