2017
DOI: 10.1007/s11263-017-1006-x
|View full text |Cite
|
Sign up to set email alerts
|

DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers

Abstract: In this paper, a new method for generating object and action proposals in images and videos is proposed. It builds on activations of different convolutional layers of a pretrained CNN, combining the localization accuracy of the early layers with the high informativeness (and hence recall) of the later layers. To this end, we build an inverse cascade that, going backward from the later to the earlier convolutional layers of the CNN, selects the most promising locations and refines them in a coarse-to-fine manne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 37 publications
(60 reference statements)
0
6
0
Order By: Relevance
“…When first created, the focus of this network was to classify materials by their textural appearance and not by their colour. Due to the excellent generalisation performance of VGG-Net, its pre-trained model on the ImageNet dataset is widely used for feature extraction problems [9,13] such as: object candidate frame (object proposal) generation [15], fine-grained object localization, image retrieval [34], image co-localization [35], etc. On the other hand, our new ap-proach is based on modifying the concept of CRNN [29].…”
Section: Approaches For Historical Handwriting Digit String Recognitionmentioning
confidence: 99%
“…When first created, the focus of this network was to classify materials by their textural appearance and not by their colour. Due to the excellent generalisation performance of VGG-Net, its pre-trained model on the ImageNet dataset is widely used for feature extraction problems [9,13] such as: object candidate frame (object proposal) generation [15], fine-grained object localization, image retrieval [34], image co-localization [35], etc. On the other hand, our new ap-proach is based on modifying the concept of CRNN [29].…”
Section: Approaches For Historical Handwriting Digit String Recognitionmentioning
confidence: 99%
“…Cascaded Architectures. There have been several attempts [6,35,36,37,38,39] that apply cascade architecture to reject easy samples at early layers or stages, and regress bounding boxes iteratively for progressive refinement. However, none of them are designed for one-stage detectors.…”
Section: Related Workmentioning
confidence: 99%
“…• Cascaded Architectures: our residual objectness is supposed to progressively address the foreground-background imbalance, which is similar with recent cascaded architectures [6,35,36,37,38,39] progressively refine boundingboxes. However, most of them [6,35,36,37,38] are only applicable for the per-region stage, whereas the only exception C-RPN [39] is designed for object tracking. Our proposed mechanism is generalized for both region-based and one-stage detectors.…”
Section: Related Workmentioning
confidence: 99%
“…In (Krahenbuhl and Koltun, 2015), a learning method is proposed by training an ensemble of figure-ground segmentation models jointly, where individual models can specialize and complement each other. In recent years, CNN-based approaches (Hayder et al, 2016;Ghodrati et al, 2016;Pont-Tuset and Gool, 2015;He and Lau, 2015) are more popular with a nontrivial margin of performance boost. Jie et al (Jie et al, 2016) proposed a scale-aware pixel-wise proposal framework where two separate networks are learned to handle large and small objects, respectively.…”
Section: Related Workmentioning
confidence: 99%
“…Object proposal is the task of proposing a set of candidate regions or bounding boxes in an image that may potentially contain an object. In recent years, the emergence of object proposal algorithms (Uijlings et al, 2013;Manén et al, 2013;Arbeláez et al, 2014;Hayder et al, 2016;Kong et al, 2016;Ghodrati et al, 2016;Chavali et al, 2016;Sun et al, 2016; have significantly boosted the development of many vision tasks, (Liu et al, 2017a,b;Li et al, 2016;Chi et al, 2016;, especially for object detection (Girshick et al, 2014;Dai et al, 2016;Girshick, 2015;Bell et al, 2016;Liu et al, 2016). It is verified by Hosang et.al (Hosang et al, 2015) that region proposals with high average recall correlates well with good performance of a detector.…”
Section: Introductionmentioning
confidence: 98%