Zhenjun Han scite author profile

Visual object detection has achieved unprecedented advance with the rise of deep convolutional neural networks. However, detecting tiny objects (for example tiny persons less than 20 pixels) in large-scale images remains not well investigated. The extremely small objects raise a grand challenge about feature representation while the massive and complex backgrounds aggregate the risk of false alarms. In this paper, we introduce a new benchmark, referred to as TinyPerson, opening up a promising direction for tiny object detection in a long distance and with massive backgrounds. We experimentally find that the scale mismatch between the dataset for network pre-training and the dataset for detector learning could deteriorate the feature representation and the detectors. Accordingly, we propose a simple yet effective Scale Match approach to align the object scales between the two datasets for favorable tinyobject representation. Experiments show the significant performance gain of our proposed approach over state-ofthe-art detectors, and the challenging aspects of TinyPerson related to real-world scenarios. The TinyPerson benchmark and the code for our approach will be publicly available 1 .

show abstract

Min-Entropy Latent Model for Weakly Supervised Object Detection

Wan

et al. 2018

View full text Add to dashboard Cite

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

et al. 2021

View full text Add to dashboard Cite

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among pixels. We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction. TS-CAM first splits an image into a sequence of patch tokens for spatial embedding, which produce attention maps of long-range visual dependency to avoid partial activation. TS-CAM then re-allocates category-related semantics for patch tokens, enabling each of them to be aware of object categories. TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semanticaware localization. Experiments on the ILSVRC/CUB-200-2011 datasets show that TS-CAM outperforms its CNN-CAM counterparts by 7.1%/27.1% for WSOL, achieving state-ofthe-art performance.

show abstract

Effective Fusion Factor in FPN for Tiny Object Detection

Gong

Ding

et al. 2021

179

View full text Add to dashboard Cite

Visual abnormal behavior detection based on trajectory sparse reconstruction analysis

Han

et al. 2013

Neurocomputing

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhenjun Han

Scale Match for Tiny Person Detection

Min-Entropy Latent Model for Weakly Supervised Object Detection

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

Effective Fusion Factor in FPN for Tiny Object Detection

Visual abnormal behavior detection based on trajectory sparse reconstruction analysis

Contact Info

Product

Resources

About