Jitendra Malik scite author profile

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012-achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at

show abstract

Scale-space and edge detection using anisotropic diffusion

Perona

Malik

1990

IEEE Trans. Pattern Anal. Machine Intell.

10,458

6,584

View full text Add to dashboard Cite

Abstracf-The scale-space technique introduced by Witkin involves generating coarser resolution images by convolving the original image with a Gaussian kernel. This approach has a major drawback: it is difficult to obtain accurately the locations of the "semantically meaningful" edges at coarse scales. In this paper we suggest a new definition of scale-space, and introduce a class of algorithms that realize it using a diffusion process. The diffusion coefficient is chosen to vary spatially in such a way as to encourage intraregion smoothing in preference to interregion smoothing. It is shown that the "no new maxima should be generated at coarse scales" property of conventional scale space is preserved. As the region boundaries in our approach remain sharp, we obtain a high quality edge detector which successfully exploits global information. Experimental results are shown on a number of images. The algorithm involves elementary, local operations replicated over the image making parallel hardware implementations feasible.Zndex Terms-Adaptive filtering, analog VLSI, edge detection, edge enhancement, nonlinear diffusion, nonlinear filtering, parallel algorithm, scale-space.

show abstract

Contour Detection and Hierarchical Image Segmentation

Arbeláez

Maire

Fowlkes

et al. 2011

IEEE Trans. Pattern Anal. Mach. Intell.

4,491

4,029

View full text Add to dashboard Cite

Abstract-This paper investigates two fundamental problems in computer vision: contour detection and image segmentation. We present state-of-the-art algorithms for both of these tasks. Our contour detector combines multiple local cues into a globalization framework based on spectral clustering. Our segmentation algorithm consists of generic machinery for transforming the output of any contour detector into a hierarchical region tree. In this manner, we reduce the problem of image segmentation to that of contour detection. Extensive experimental evaluation demonstrates that both our contour detection and segmentation methods significantly outperform competing algorithms. The automatically generated hierarchical segmentations can be interactively refined by userspecified annotations. Computation at multiple image resolutions provides a means of coupling our system to recognition applications.

show abstract

Shape matching and object recognition using shape contexts

Belongie

Malik

Puzicha³

2002

IEEE Trans. Pattern Anal. Machine Intell.

5,516

3,944

View full text Add to dashboard Cite

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

Martín¹,

et al.

View full text Add to dashboard Cite

This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

show abstract

SlowFast Networks for Video Recognition

et al. 2019

View full text Add to dashboard Cite

We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. Code has been made available at: https://github.com/ facebookresearch/SlowFast.

show abstract

End-to-End Recovery of Human Shape and Pose

et al. 2018

View full text Add to dashboard Cite

Learning to detect natural image boundaries using local brightness, color, and texture cues

Martín

Fowlkes

Malik

2004

IEEE Trans. Pattern Anal. Machine Intell.

2,132

1,838

View full text Add to dashboard Cite

The goal of this work is to accurately detect and localize boundaries in natural scenes using local image measurements. We formulate features that respond to characteristic changes in brightness, color, and texture associated with natural boundaries. In order to combine the information from these features in an optimal way, we train a classifier using human labeled images as ground truth. The output of this classifier provides the posterior probability of a boundary at each image location and orientation. We present precision-recall curves showing that the resulting detector significantly outperforms existing approaches. Our two main results are 1) that cue combination can be performed adequately with a simple linear model and 2) that a proper, explicit treatment of texture is required to detect boundaries in natural images.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jitendra Malik

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Scale-space and edge detection using anisotropic diffusion

Contour Detection and Hierarchical Image Segmentation

Shape matching and object recognition using shape contexts

A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics

SlowFast Networks for Video Recognition

End-to-End Recovery of Human Shape and Pose

Learning to detect natural image boundaries using local brightness, color, and texture cues

Contact Info

Product

Resources

About