Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction

Zhang, Yuting; Sohn, Kihyuk; Villegas, Ruben; Pan, Gang; Lee, Honglak

doi:10.1109/cvpr.2015.7298621

Cited by 162 publications

(85 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Under 07 + 12 train val, VGG16 has achieved up to 2.1% mAP improvement. Moreover, compared to other typical region-based detectors, such as AC-CNN [9], Yuting [15], MR-CNN [1], the proposed approach yields competitive performance as well. OHEM [12] is the state-of-the-art object detection approach, which has introduced online bootstrapping to the design of network structure based on the FastRCNN framework.…”

Section: Methodsmentioning

confidence: 92%

Improving object detection with region similarity learning

Gag

Lou

Wang

et al. 2017

2017 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

Object detection aims to identify instances of semantic objects of a certain class in images or videos. The success of state-of-the-art approaches is attributed to the significant progress of object proposal and convolutional neural networks (CNNs). Most promising detectors involve multi-task learning with an optimization objective of softmax loss and regression loss. The first is for multi-class categorization, while the latter is for improving localization accuracy. However, few of them attempt to further investigate the hardness of distinguishing different sorts of distracting background regions (i.e., negatives) from true object regions (i.e., positives). To improve the performance of classifying positive object regions vs. a variety of negative background regions, we propose to incorporate triplet embedding into learning objective. The triplet units are formed by assigning each negative region to a meaningful object class and establishing classspecific negatives, followed by triplets construction. Over the benchmark PASCAL VOC 2007, the proposed triplet embedding has improved the performance of well-known FastRCNN model with a mAP gain of 2.1%. In particular, the state-of-the-art approach OHEM can benefit from the triplet embedding and has achieved a mAP improvement of 1.2%.

show abstract

Section: Methodsmentioning

confidence: 92%

Improving object detection with region similarity learning

Gag

Lou

Wang

et al. 2017

2017 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

show abstract

“…For example, using our method can enable training directly for other rank-based metrics used in information retrieval, such as discounted cumulative gain [17]. Moreover, we do not require a potentially expensive max-oracle to find the most-violating inputs with respect to the model and loss, as required by [18,19,2].…”

Section: Introductionmentioning

confidence: 99%

“…Like our method, this is a structured loss involving IoU of detections and ground-truth objects; however, it does not correspond to maximising AP, and only a single detection is returned in each image, so there is no NMS. More recently, [2] uses the same structured SVM loss, but with a CNN in place of a kernelised linear model over SURF features [26]. This work directly optimises the structured SVM loss via gradient descent, allowing backpropagation to update the nonlinear CNN layers.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

End-to-End Training of Object Class Detectors for Mean Average Precision

Henderson

Ferrari

2017

Computer Vision – ACCV 2016

154

112

View full text Add to dashboard Cite

Abstract. We present a method for training CNN-based object class detectors directly using mean average precision (mAP) as the training loss, in a truly endto-end fashion that includes non-maximum suppression (NMS) at training time. This contrasts with the traditional approach of training a CNN for a window classification loss, then applying NMS only at test time, when mAP is used as the evaluation metric in place of classification accuracy. However, mAP following NMS forms a piecewise-constant structured loss over thousands of windows, with gradients that do not convey useful information for gradient descent. Hence, we define new, general gradient-like quantities for piecewise constant functions, which have wide applicability. We describe how to calculate these efficiently for mAP following NMS, enabling to train a detector based on Fast R-CNN [1] directly for mAP. This model achieves equivalent performance to the standard Fast R-CNN on the PASCAL VOC 2007 and 2012 datasets, while being conceptually more appealing as the very same model and loss are used at both training and test time.

show abstract

“…These feature vectors are then compared to a fine-tuned pre-trained model to score regions and find the best class for each object (Girshick, et al, 2015). Zhang, et al (2015) proposed two search algorithms to localize objects with high accuracy based on Bayesian optimization and also a deep learning framework based on a structured SVM objective function and CNN classifier. The results on PASCAL VOC 2007 and 2012 benchmarks highlight the significant improvement on detection performance .…”

Section: Introductionmentioning

confidence: 99%

Knowledge Based 3d Building Model Recognition Using Convolutional Neural Networks From Lidar and Aerial Imageries

Alidoost¹,

Arefi²

2016

Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.

View full text Add to dashboard Cite

ABSTRACT:In recent years, with the development of the high resolution data acquisition technologies, many different approaches and algorithms have been presented to extract the accurate and timely updated 3D models of buildings as a key element of city structures for numerous applications in urban mapping. In this paper, a novel and model-based approach is proposed for automatic recognition of buildings' roof models such as flat, gable, hip, and pyramid hip roof models based on deep structures for hierarchical learning of features that are extracted from both LiDAR and aerial ortho-photos. The main steps of this approach include building segmentation, feature extraction and learning, and finally building roof labeling in a supervised pre-trained Convolutional Neural Network (CNN) framework to have an automatic recognition system for various types of buildings over an urban area. In this framework, the height information provides invariant geometric features for convolutional neural network to localize the boundary of each individual roofs. CNN is a kind of feed-forward neural network with the multilayer perceptron concept which consists of a number of convolutional and subsampling layers in an adaptable structure and it is widely used in pattern recognition and object detection application. Since the training dataset is a small library of labeled models for different shapes of roofs, the computation time of learning can be decreased significantly using the pre-trained models. The experimental results highlight the effectiveness of the deep learning approach to detect and extract the pattern of buildings' roofs automatically considering the complementary nature of height and RGB information.

show abstract

Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction

Cited by 162 publications

References 39 publications

Improving object detection with region similarity learning

Improving object detection with region similarity learning

End-to-End Training of Object Class Detectors for Mean Average Precision

Knowledge Based 3d Building Model Recognition Using Convolutional Neural Networks From Lidar and Aerial Imageries

Contact Info

Product

Resources

About