Huaizu Jiang scite author profile

Abstract-Salient object detection has been attracting a lot of interest, and recently various heuristic computational models have been designed. In this paper, we formulate saliency map computation as a regression problem. Our method, which is based on multi-level image segmentation, utilizes the supervised learning approach to map the regional feature vector to a saliency score. Saliency scores across multiple layers are finally fused to produce the saliency map. The contributions lie in two-fold. One is that we propose a discriminate regional feature integration approach for salient object detection. Compared with existing heuristic models, our proposed method is able to automatically integrate high-dimensional regional saliency features and choose discriminative ones. The other is that by investigating standard generic region properties as well as two widely studied concepts for salient object detection, i.e., regional contrast and backgroundness, our approach significantly outperforms state-of-the-art methods on six benchmark datasets. Meanwhile, we demonstrate that our method runs as fast as most existing algorithms.

show abstract

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

Jiang

Sun²,

Jampani³

et al. 2018

682

758

View full text Add to dashboard Cite

Given two consecutive frames, video interpolation aims at generating intermediate frame(s) to form both spatially and temporally coherent video sequences. While most existing methods focus on single-frame interpolation, we propose an end-to-end convolutional neural network for variable-length multi-frame video interpolation, where the motion interpretation and occlusion reasoning are jointly modeled. We start by computing bi-directional optical flow between the input images using a U-Net architecture. These flows are then linearly combined at each time step to approximate the intermediate bi-directional optical flows. These approximate flows, however, only work well in locally smooth regions and produce artifacts around motion boundaries. To address this shortcoming, we employ another U-Net to refine the approximated flow and also predict soft visibility maps. Finally, the two input images are warped and linearly fused to form each intermediate frame. By applying the visibility maps to the warped images before fusion, we exclude the contribution of occluded pixels to the interpolated intermediate frame to avoid artifacts. Since none of our learned network parameters are time-dependent, our approach is able to produce as many intermediate frames as needed. To train our network, we use 1,132 240-fps video clips, containing 300K individual video frames. Experimental results on several datasets, predicting different numbers of interpolated frames, demonstrate that our approach performs consistently better than existing methods.

show abstract

Automatic salient object segmentation based on context and shape prior

Jiang¹,

Wang²,

Yuan³

et al. 2011

397

333

View full text Add to dashboard Cite

We propose a novel automatic salient object segmentation algorithm which integrates both bottom-up salient stimuli and object-level shape prior, i.e., a salient object has a well-defined closed boundary. Our approach is formalized as an iterative energy minimization framework, leading to binary segmentation of the salient object. Such energy minimization is initialized with a saliency map which is computed through context analysis based on multi-scale superpixels. Object-level shape prior is then extracted combining saliency with object boundary information. Both saliency map and shape prior update after each iteration. Experimental results on two public benchmark datasets show that our proposed approach outperforms state-of-the-art methods.

show abstract

Face Detection with the Faster R-CNN

2017

View full text Add to dashboard Cite

show abstract

Salient object detection: A survey

et al. 2019

View full text Add to dashboard Cite

Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision. While many models have been proposed and several applications have emerged, a deep understanding of achievements and issues remains lacking. We aim to provide a comprehensive review of recent progress in salient object detection and situate this field among other closely related areas such as generic scene segmentation, object proposal generation, and saliency for fixation prediction. Covering 228 publications, we survey i) roots, key concepts, and tasks, ii) core techniques and main modeling trends, and iii) datasets and evaluation metrics for salient object detection. We also discuss open problems such as evaluation metrics and dataset bias in model performance, and suggest future research directions.

show abstract

Salient Object Detection: A Benchmark

Borji

Cheng

Jiang

et al. 2015

IEEE Trans. on Image Process.

1,201

305

View full text Add to dashboard Cite

We extensively compare, qualitatively and quantitatively, 41 state-of-the-art models (29 salient object detection, 10 fixation prediction, 1 objectness, and 1 baseline) over seven challenging data sets for the purpose of benchmarking salient object detection and segmentation methods. From the results obtained so far, our evaluation shows a consistent rapid progress over the last few years in terms of both accuracy and running time. The top contenders in this benchmark significantly outperform the models identified as the best in the previous benchmark conducted three years ago. We find that the models designed specifically for salient object detection generally work better than models in closely related areas, which in turn provides a precise definition and suggests an appropriate treatment of this problem that distinguishes it from other problems. In particular, we analyze the influences of center bias and scene complexity in model performance, which, along with the hard cases for the state-of-the-art models, provide useful hints toward constructing more challenging large-scale data sets and better saliency models. Finally, we propose probable solutions for tackling several open problems, such as evaluation scores and data set bias, which also suggest future research directions in the rapidly growing field of salient object detection.

show abstract

In Defense of Grid Features for Visual Question Answering

et al. 2020

View full text Add to dashboard Cite

Popularized as 'bottom-up' attention [2], bounding box (or region) based visual features have recently surpassed vanilla grid-based convolutional features as the de facto standard for vision and language tasks like visual question answering (VQA). However, it is not clear whether the advantages of regions (e.g. better localization) are the key reasons for the success of bottom-up attention. In this paper, we revisit grid features for VQA, and find they can work surprisingly well -running more than an order of magnitude faster with the same accuracy (e.g. if pre-trained in a similar fashion). Through extensive experiments, we verify that this observation holds true across different VQA models (reporting a state-of-the-art accuracy on VQA 2.0 test-std, 72.71), datasets, and generalizes well to other tasks like image captioning. As grid features make the model design and training process much simpler, this enables us to train them end-to-end and also use a more flexible network design. We learn VQA models end-to-end, from pixels directly to answers, and show that strong performance is achievable without using any region annotations in pre-training. We hope our findings help further improve the scientific understanding and the practical application of VQA. Code and features will be made available. * This work was done when Huaizu Jiang was an intern at FAIR. 1 We use the terms 'region' and 'bounding box' interchangeably.

show abstract

Salient Object Detection: A Discriminative Regional Feature Integration Approach

et al. 2016

View full text Add to dashboard Cite

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Huaizu Jiang

Salient Object Detection: A Discriminative Regional Feature Integration Approach

Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation

Automatic salient object segmentation based on context and shape prior

Face Detection with the Faster R-CNN

Salient object detection: A survey

Salient Object Detection: A Benchmark

In Defense of Grid Features for Visual Question Answering

Salient Object Detection: A Discriminative Regional Feature Integration Approach

Contact Info

Product

Resources

About