Attentive Contexts for Object Detection

Li, Jianan; Wei, Yunchao; Liang, Xiaodan; Dong, Jian; Xu, Tingfa; Feng, Jiashi; Yan, Shuicheng

doi:10.1109/tmm.2016.2642789

Cited by 229 publications

(119 citation statements)

References 41 publications

(73 reference statements)

Supporting

Mentioning

112

Contrasting

Order By: Relevance

“…Under 07 + 12 train val, VGG16 has achieved up to 2.1% mAP improvement. Moreover, compared to other typical region-based detectors, such as AC-CNN [9], Yuting [15], MR-CNN [1], the proposed approach yields competitive performance as well. OHEM [12] is the state-of-the-art object detection approach, which has introduced online bootstrapping to the design of network structure based on the FastRCNN framework.…”

Section: Methodsmentioning

confidence: 92%

“…In FastRCNN [3], many hyperparameters are introduced for efficient learning, e.g., the thresholds to define foreground RoIs (regions of interest) and background RoIs, the sampling ratio of positive (foreground RoIs) and negative samples (background RoIs) in mini-batch stochastic gradient descent (SGD) optimization, etc. Li et al [9] proposed to use LSTM cells [10] to capture local context information of proposal boxes and global context information of entire images to strengthen the discrimination ability of RoI's feature. In [11], Li et al took advantages of multiple subnetworks' output to deal with large scale changes.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving object detection with region similarity learning

Gag

Lou

Wang

et al. 2017

2017 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

Object detection aims to identify instances of semantic objects of a certain class in images or videos. The success of state-of-the-art approaches is attributed to the significant progress of object proposal and convolutional neural networks (CNNs). Most promising detectors involve multi-task learning with an optimization objective of softmax loss and regression loss. The first is for multi-class categorization, while the latter is for improving localization accuracy. However, few of them attempt to further investigate the hardness of distinguishing different sorts of distracting background regions (i.e., negatives) from true object regions (i.e., positives). To improve the performance of classifying positive object regions vs. a variety of negative background regions, we propose to incorporate triplet embedding into learning objective. The triplet units are formed by assigning each negative region to a meaningful object class and establishing classspecific negatives, followed by triplets construction. Over the benchmark PASCAL VOC 2007, the proposed triplet embedding has improved the performance of well-known FastRCNN model with a mAP gain of 2.1%. In particular, the state-of-the-art approach OHEM can benefit from the triplet embedding and has achieved a mAP improvement of 1.2%.

show abstract

Section: Methodsmentioning

confidence: 92%

Section: Introductionmentioning

confidence: 99%

Improving object detection with region similarity learning

Gag

Lou

Wang

et al. 2017

2017 IEEE International Conference on Multimedia and Expo (ICME)

View full text Add to dashboard Cite

show abstract

“…Liang et al [29] explicitly constructed a semantic neuron graph network by incorporating the semantic concept hierarchy. On the other hand, there are some sequential reasoning models for relationships [4,21]. In these works, a fixed graph is usually considered, while our Graphonomy makes further efforts from external knowledge embedding to graph representation transfer.…”

Section: Related Workmentioning

confidence: 99%

Graphonomy: Universal Human Parsing via Graph Transfer Learning

Gong

Gao

Liang

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Self Cite

168

105

View full text Add to dashboard Cite

Prior highly-tuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive re-training. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at various levels of granularity. This poses many fundamental learning challenges, e.g. discovering underlying semantic structures among different label granularity, performing proper transfer learning across different image domains, and identifying and utilizing label redundancies across related tasks.To address these challenges, we propose a new universal human parsing agent, named "Graphonomy", which incorporates hierarchical graph transfer learning upon the conventional parsing network to encode the underlying label semantic structures and propagate relevant semantic information. In particular, Graphonomy first learns and propagates compact high-level graph representation among the labels within one dataset via Intra-Graph Reasoning, and then transfers semantic information across multiple datasets via Inter-Graph Transfer. Various graph transfer dependencies (e.g., similarity, linguistic knowledge) between different datasets are analyzed and encoded to enhance graph transfer capability. By distilling universal semantic graph representation to each specific task, Graphonomy is able to predict all levels of parsing labels in one system without piling up the complexity. Experimental results show Graphonomy effectively achieves the state-of-the-art results on three human parsing benchmarks as well as advantageous universal human parsing performance.

show abstract

“…The common characteristic of these methods is that they only focus on single instance problems. For multi-object recognition, AC-CNN [15], LPA [13] and RelationNet [11] have been proposed to discover a global contextual guidance. AC-CNN examines the global context through the stacked Long Short-Term Memory (LSTM) units.…”

Section: Visual Attentionmentioning

confidence: 99%

ASSD: Attentive single shot multibox detector

Metaxas

2019

Computer Vision and Image Understanding

View full text Add to dashboard Cite

This paper proposes a new deep neural network for object detection. The proposed network, termed ASSD, builds feature relations in the spatial space of the feature map. With the global relation information, ASSD learns to highlight useful regions on the feature maps while suppressing the irrelevant information, thereby providing reliable guidance for object detection. Compared to methods that rely on complicated CNN layers to refine the feature maps, ASSD is simple in design and is computationally efficient. Experimental results show that ASSD competes favorably with the state-of-the-arts, including SSD, DSSD, FSSD and Reti-naNet. Code is available at: https://github.com/ yijingru/ASSD-Pytorch.

show abstract

Attentive Contexts for Object Detection

Cited by 229 publications

References 41 publications

Improving object detection with region similarity learning

Improving object detection with region similarity learning

Graphonomy: Universal Human Parsing via Graph Transfer Learning

ASSD: Attentive single shot multibox detector

Contact Info

Product

Resources

About