Recurrent Scale Approximation for Object Detection in CNN

Yu, Li; Li, Hongyang; Yan, Junjie; Wei, Fangyin; Wang, Xiaogang; Tang, Xiaoou

doi:10.1109/iccv.2017.69

Cited by 97 publications

(56 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since state-of-the-art 2D detectors [20,17,13,16,15] can provide reliable 2D bounding boxes for objects, several works use 2D box as a prior to reduce the search region of 3D box [1,18]. [1] uses a CNN to predict the parts coordinates, visibility and template similarity based on the 2D box, and match the best corresponding 3D template.…”

Section: Related Workmentioning

confidence: 99%

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

Ouyang

Sheng

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

325

170

View full text Add to dashboard Cite

We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving. Our efforts are put on extracting the underlying 3D information in a 2D image and determining the accurate 3D bounding box of the object without point cloud or stereo data. Leveraging the off-the-shelf 2D object detector, we propose an artful approach to efficiently obtain a coarse cuboid for each predicted 2D box. The coarse cuboid has enough accuracy to guide us to determine the 3D box of the object by refinement. In contrast to previous state-ofthe-art methods that only use the features extracted from the 2D bounding box for box refinement, we explore the 3D structure information of the object by employing the visual features of visible surfaces. The new features from surfaces are utilized to eliminate the problem of representation ambiguity brought by only using a 2D bounding box. Moreover, we investigate different methods of 3D box refinement and discover that a classification formulation with quality aware loss has much better performance than regression. Evaluated on the KITTI benchmark, our approach outperforms current state-of-the-art methods for single RGB image based 3D object detection.

show abstract

Section: Related Workmentioning

confidence: 99%

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

Ouyang

Sheng

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

325

170

View full text Add to dashboard Cite

show abstract

“…It treats aligned and unaligned images separately, thereby using a context-switching technique for a given input image. Images are aligned using the co-ordinates provided with the dataset along with MTCNN and Recurrent Scale Approximation (RSA) [41]. Features learned by the CNNs are directly used for classification.…”

Section: (Iii) Deep Disguise Recognizer Network (Ddrnet) [27]mentioning

confidence: 99%

Recognizing Disguised Faces in the Wild

Singh

Vatsa

et al. 2019

IEEE Trans. Biom. Behav. Identity Sci.

View full text Add to dashboard Cite

Research in face recognition has seen tremendous growth over the past couple of decades. Beginning from algorithms capable of performing recognition in constrained environments, the current face recognition systems achieve very high accuracies on large-scale unconstrained face datasets. While upcoming algorithms continue to achieve improved performance, a majority of the face recognition systems are susceptible to failure under disguise variations, one of the most challenging covariate of face recognition. In literature, some algorithms demonstrate promising results on existing disguise datasets, however, most of the disguise datasets contain images with limited variations, often captured in controlled settings. This does not simulate a real world scenario, where both intentional and unintentional unconstrained disguises are encountered by a face recognition system. In this paper, a novel Disguised Faces in the Wild (DFW) dataset is proposed which contains over 11,000 images of 1,000 identities with variations across different types of disguise accessories. The dataset is collected from the Internet, resulting in unconstrained face images similar to real world settings. This is the first-of-a-kind dataset with the availability of impersonator and genuine obfuscated face images for each subject. The proposed DFW dataset has been analyzed in terms of three levels of difficulty: (i) easy, (ii) medium, and (iii) hard in order to showcase the challenging nature of the problem. It is our view that the research community can greatly benefit from the DFW dataset in terms of developing algorithms robust to such adversaries. The proposed dataset was released as part of the First International Workshop and Competition on Disguised Faces in the Wild at the International Conference on Computer Vision and Pattern Recognition, 2018. This paper presents the DFW dataset in detail, including the evaluation protocols, baseline results, performance analysis of the submissions received as part of the competition, and three levels of difficulties of the DFW challenge dataset. Index Terms-FaceRecognition, Disguise in the Wild. ! arXiv:1811.08837v1 [cs.CV]

show abstract

“…On AFW, our algorithm achieves an AP of 99.94% us- ing RPN+S 2 AP . On FDDB, RPN+S 2 AP recalls 93.59% faces with 50 false positive higher than [19] which also utilizes the scale information and on MALF our method recalls 77.92% faces with zeros false positive. Note that the shape and scale definition of bounding box on each benchmark varies.…”

Section: Comparing With State-of-the-artmentioning

confidence: 89%

Beyond Trade-Off: Accelerate FCN-Based Face Detector with Higher Accuracy

Song

Liu

Jiang

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

View full text Add to dashboard Cite

Fully convolutional neural network (FCN) has been dominating the game of face detection task for a few years with its congenital capability of sliding-window-searching with shared kernels, which boiled down all the redundant calculation, and most recent state-of-the-art methods such as Faster-RCNN, SSD, YOLO and FPN use FCN as their backbone. So here comes one question: Can we find a universal strategy to further accelerate FCN with higher accuracy, so could accelerate all the recent FCN-based methods? To analyze this, we decompose the face searching space into two orthogonal directions, 'scale' and 'spatial'. Only a few coordinates in the space expanded by the two base vectors indicate foreground. So if FCN could ignore most of the other points, the searching space and false alarm should be significantly boiled down. Based on this philosophy, a novel method named scale estimation and spatial attention proposal (S 2 AP ) is proposed to pay attention to some specific scales and valid locations in image pyramid. Furthermore, we adopt a masked-convolution operation based on the attention result to accelerate FCN calculation. Experiments show that FCN-based method RPN can be accelerated by about 4× with the help of S 2 AP and masked-FCN and at the same time it can also achieve the state-of-the-art on FDDB, AFW and MALF face detection benchmarks as well.

show abstract

Recurrent Scale Approximation for Object Detection in CNN

Cited by 97 publications

References 36 publications

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving

Recognizing Disguised Faces in the Wild

Beyond Trade-Off: Accelerate FCN-Based Face Detector with Higher Accuracy

Contact Info

Product

Resources

About