Learning Spatial Awareness to Improve Crowd Counting

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

et al. 2019

157

107

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i.e., the number of population can vary in [0, +∞) in theory. However, collected data and labeled instances are limited in reality, which means that only a small closed set is observed. Existing methods typically model this task in a regression manner, while they are prone to suffer from an unseen scene with counts out of the scope of the closed set. In fact, counting has an interesting and exclusive property-spatially decomposable. A dense region can always be divided until sub-region counts are within the previously observed closed set. We therefore introduce the idea of spatial divide-and-conquer (S-DC) that transforms open-set counting into a closed-set problem. This idea is implemented by a novel Supervised Spatial Divide-and-Conquer Network (SS-DCNet). Thus, SS-DCNet can only learn from a closed set but generalize well to open-set scenarios via S-DC. SS-DCNet is also efficient. To avoid repeatedly computing sub-region convolutional features, S-DC is executed on the feature map instead of on the input image. We provide theoretical analyses as well as a controlled experiment on toy data, demonstrating why closed-set modeling makes sense. Extensive experiments show that SS-DCNet achieves the state-of-the-art performance on three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF), a vehicle counting dataset (TRANCOS) and a plant counting dataset (MTC), with a 7.7% relative improvement on the UCF-QNRF, 33.1% on the TRANCOS, and 26.4% on the MTC. SS-DCNet also reports the state-of-the-art cross-domain performance on crowd counting datasets. Particularly in the task from UCF-QNRF to ShanghaiTech Part_A, SS-DCNet even beats most existing models trained directly on the target domain. Code and models have been made available at: https://tinyurl.com/SS-DCNet.

Section: Single-stage Spatial Divide-and-conquermentioning

confidence: 99%

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Xiong

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

et al. 2019

157

107

“…We follow the previous works [7,26,34,38,52] to use the mean absolute error (MAE) and the root mean square error (MSE) to evaluate all the methods. Assume is the count of the ℎ image, is the corresponding groundtruth,…”

Section: Evaluation Metricsmentioning

confidence: 99%

Deep Structural Contour Detection

Deng

Proceedings of the 28th ACM International Conference on Multimedia

2020

Object contour detection is the fundamental and preprocessing step for multimedia applications such as icon generation, object segmentation, and tracking. The quality of contour prediction is of great importance in these applications since it affects the subsequent process. In this work, we aim to develop a high-performance contour detection system. We first propose a novel yet very effective loss function for contour detection. The proposed loss function is capable of penalizing the distance of contour-structure similarity between each pair of prediction and ground-truth. Moreover, to better distinguishing object contours and background textures, we introduce a novel convolutional encoder-decoder network. Within the network, we present a hyper module that captures dense connections among high-level features and produces effective semantic information. Then the information is progressively propagated and fused with low-level features. We conduct extensive experiments on the BSDS500 and Multi-Cue datasets, the results show significant improvement against the state-of-the-art competitors. We further demonstrate the benefit of our DSCD method for crowd counting.

“…The integral of the density map gives the crowd count in the image [12]. Researches in recent trends focused on designing more powerful DNN structures and exploiting more effective learning paradigms [2,4,14,18,27,31,34]. For instance, Guo et al [4] designed multi-rate dilated convolutions to capture rich spatial context at different scales of density maps; Liu et al [18] introduced an improved dilated multi-scale structure similarity (DMS-SSIM) loss to learn density maps with local consistency; Xu et al [37] [17] and they are not capable of providing individual locations in the crowds, which, on the other hand, are believed to be the merits of detectionbased crowd counting methods, as specified below.…”

Section: Regression-based Crowd Countingmentioning

confidence: 99%

Towards Unsupervised Crowd Counting via Regression-Detection Bi-knowledge Transfer

Proceedings of the 28th ACM International Conference on Multimedia

Wang

Shi

et al. 2020

Unsupervised crowd counting is a challenging yet not largely explored task. In this paper, we explore it in a transfer learning setting where we learn to detect and count persons in an unlabeled target set by transferring bi-knowledge learnt from regression-and detection-based models in a labeled source set. The dual source knowledge of the two models is heterogeneous and complementary as they capture different modalities of the crowd distribution. We formulate the mutual transformations between the outputs of regression-and detection-based models as two scene-agnostic transformers which enable knowledge distillation between the two models. Given the regression-and detection-based models and their mutual transformers learnt in the source, we introduce an iterative self-supervised learning scheme with regression-detection bi-knowledge transfer in the target. Extensive experiments on standard crowd counting benchmarks, ShanghaiTech, UCF_CC_50, and UCF_QNRF demonstrate a substantial improvement of our method over other state-of-the-arts in the transfer learning setting. CCS CONCEPTS • Information systems → Multimedia information systems; • Humancentered computing → Collaborative and social computing; Visualization; • Computing methodologies → Computer vision.