DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers

Ghodrati, Amir; Diba, Ali; Pedersoli, Marco; Tuytelaars, Tinne; Gool, Luc Van

doi:10.1007/s11263-017-1006-x

Cited by 19 publications

(6 citation statements)

References 37 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When first created, the focus of this network was to classify materials by their textural appearance and not by their colour. Due to the excellent generalisation performance of VGG-Net, its pre-trained model on the ImageNet dataset is widely used for feature extraction problems [9,13] such as: object candidate frame (object proposal) generation [15], fine-grained object localization, image retrieval [34], image co-localization [35], etc. On the other hand, our new ap-proach is based on modifying the concept of CRNN [29].…”

Section: Approaches For Historical Handwriting Digit String Recognitionmentioning

confidence: 99%

End-to-End Approach for Recognition of Historical Digit Strings

Zhao,

Hochuli,

Cheddad

2021

Preprint

View full text Add to dashboard Cite

The plethora of digitalised historical document datasets released in recent years has rekindled interest in advancing the field of handwriting pattern recognition. In the same vein, a recently published data set, known as ARDIS, presents handwritten digits manually cropped from 15.000 scanned documents of Swedish church books and exhibiting various handwriting styles. To this end, we propose an end-to-end segmentation-free deep learning approach to handle this challenging ancient handwriting style of dates present in the ARDIS dataset (4-digits long strings). We show that with slight modifications in the VGG-16 deep model, the framework can achieve a recognition rate of 93.2%, resulting in a feasible solution free of heuristic methods, segmentation, and fusion methods. Moreover, the proposed approach outperforms the wellknown CRNN method (a model widely applied in handwriting recognition tasks).

show abstract

Section: Approaches For Historical Handwriting Digit String Recognitionmentioning

confidence: 99%

End-to-End Approach for Recognition of Historical Digit Strings

Zhao,

Hochuli,

Cheddad

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Cascaded Architectures. There have been several attempts [6,35,36,37,38,39] that apply cascade architecture to reject easy samples at early layers or stages, and regress bounding boxes iteratively for progressive refinement. However, none of them are designed for one-stage detectors.…”

Section: Related Workmentioning

confidence: 99%

“…• Cascaded Architectures: our residual objectness is supposed to progressively address the foreground-background imbalance, which is similar with recent cascaded architectures [6,35,36,37,38,39] progressively refine boundingboxes. However, most of them [6,35,36,37,38] are only applicable for the per-region stage, whereas the only exception C-RPN [39] is designed for object tracking. Our proposed mechanism is generalized for both region-based and one-stage detectors.…”

Section: Related Workmentioning

confidence: 99%

Residual Objectness for Imbalance Reduction

Chen¹,

Liu²,

Luo³

et al. 2019

Preprint

View full text Add to dashboard Cite

For a long time, object detectors have suffered from extreme imbalance between foregrounds and backgrounds. While several sampling/reweighting schemes have been explored to alleviate the imbalance, they are usually heuristic and demand laborious hyper-parameters tuning, which is hard to achieve the optimality. In this paper, we first reveal that such the imbalance could be addressed in a learning-based manner. Guided by this illuminating observation, we propose a novel Residual Objectness (ResObj) mechanism that addresses the imbalance by end-to-end optimization, while no further hand-crafted sampling/reweighting is required. Specifically, by applying multiple cascaded objectness-related modules with residual connections, we formulate an elegant consecutive refinement procedure for distinguishing the foregrounds from backgrounds, thereby progressively addressing the imbalance. Extensive experiments present the effectiveness of our method, as well as its compatibility and adaptivity for both region-based and one-stage detectors, namely, the RetinaNet-ResObj, YOLOv3-ResObj and FasterRCNN-ResObj achieve relative 3.6%, 3.9%, 3.2% Average Precision (AP) improvements compared with their vanilla models on COCO, respectively.

show abstract

“…In (Krahenbuhl and Koltun, 2015), a learning method is proposed by training an ensemble of figure-ground segmentation models jointly, where individual models can specialize and complement each other. In recent years, CNN-based approaches (Hayder et al, 2016;Ghodrati et al, 2016;Pont-Tuset and Gool, 2015;He and Lau, 2015) are more popular with a nontrivial margin of performance boost. Jie et al (Jie et al, 2016) proposed a scale-aware pixel-wise proposal framework where two separate networks are learned to handle large and small objects, respectively.…”

Section: Related Workmentioning

confidence: 99%

“…Object proposal is the task of proposing a set of candidate regions or bounding boxes in an image that may potentially contain an object. In recent years, the emergence of object proposal algorithms (Uijlings et al, 2013;Manén et al, 2013;Arbeláez et al, 2014;Hayder et al, 2016;Kong et al, 2016;Ghodrati et al, 2016;Chavali et al, 2016;Sun et al, 2016; have significantly boosted the development of many vision tasks, (Liu et al, 2017a,b;Li et al, 2016;Chi et al, 2016;, especially for object detection (Girshick et al, 2014;Dai et al, 2016;Girshick, 2015;Bell et al, 2016;Liu et al, 2016). It is verified by Hosang et.al (Hosang et al, 2015) that region proposals with high average recall correlates well with good performance of a detector.…”

Section: Introductionmentioning

confidence: 98%

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

Liu

Ouyang

et al. 2018

Int J Comput Vis

View full text Add to dashboard Cite

In this paper, we propose a zoom-out-and-in network for generating object proposals. A key observation is that it is difficult to classify anchors of different sizes with the same set of features. Anchors of different sizes should be placed accordingly based on different depth within a network: smaller boxes on high-resolution layers with a smaller stride while larger boxes on low-resolution counterparts with a larger stride. Inspired by the conv/deconv structure, we fully leverage the low-level local details and high-level regional semantics from two feature map streams, which are complimentary to each other, to identify the objectness in an image. A map attention decision (MAD) unit is further proposed to aggressively search for neuron activations among two streams and attend the most contributive ones on the feature learning of the final loss. The unit serves as a decisionmaker to adaptively activate maps along certain channels with the solely purpose of optimizing the overall training loss. One advantage of MAD is that the learned weights enforced on each feature channel is predicted on-the-fly based on the input context, which is more suitable than the fixed enforcement of a convolutional kernel. Experimental results on three datasets, including PASCAL VOC 2007, ImageNet DET, MS COCO, demonstrate the effectiveness of our proposed algorithm over other state-of-the-arts, in terms of average recall (AR) for region proposal and average precision (AP) for object detection.

show abstract

DeepProposals: Hunting Objects and Actions by Cascading Deep Convolutional Layers

Cited by 19 publications

References 37 publications

End-to-End Approach for Recognition of Historical Digit Strings

End-to-End Approach for Recognition of Historical Digit Strings

Residual Objectness for Imbalance Reduction

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

Contact Info

Product

Resources

About