Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN

Sam, Deepak Babu; Sajjan, Neeraj N; Babu, R. Venkatesh; Srinivasan, Mukundhan

doi:10.1109/cvpr.2018.00381

Cited by 206 publications

(145 citation statements)

References 14 publications

Supporting

Mentioning

143

Contrasting

Unclassified

Order By: Relevance

“…A Spatial Softmax function is applied at the End of Upsampler, which constrains the sum of upsampling weights in each 2 × 2 adjacent regions to be 1 and ensures consistent local count values in the same image area after upsampling. The final output channel is 1 for R-Counter and class num for C-Counter…”

Section: Classification-based Counter (C-counter)mentioning

confidence: 99%

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Xiong

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

146

103

View full text Add to dashboard Cite

Visual counting, a task that aims to estimate the number of objects from an image/video, is an open-set problem by nature, i.e., the number of population can vary in [0, +∞) in theory. However, collected data and labeled instances are limited in reality, which means that only a small closed set is observed. Existing methods typically model this task in a regression manner, while they are prone to suffer from an unseen scene with counts out of the scope of the closed set. In fact, counting has an interesting and exclusive property-spatially decomposable. A dense region can always be divided until sub-region counts are within the previously observed closed set. We therefore introduce the idea of spatial divide-and-conquer (S-DC) that transforms open-set counting into a closed-set problem. This idea is implemented by a novel Supervised Spatial Divide-and-Conquer Network (SS-DCNet). Thus, SS-DCNet can only learn from a closed set but generalize well to open-set scenarios via S-DC. SS-DCNet is also efficient. To avoid repeatedly computing sub-region convolutional features, S-DC is executed on the feature map instead of on the input image. We provide theoretical analyses as well as a controlled experiment on toy data, demonstrating why closed-set modeling makes sense. Extensive experiments show that SS-DCNet achieves the state-of-the-art performance on three crowd counting datasets (ShanghaiTech, UCF_CC_50 and UCF-QNRF), a vehicle counting dataset (TRANCOS) and a plant counting dataset (MTC), with a 7.7% relative improvement on the UCF-QNRF, 33.1% on the TRANCOS, and 26.4% on the MTC. SS-DCNet also reports the state-of-the-art cross-domain performance on crowd counting datasets. Particularly in the task from UCF-QNRF to ShanghaiTech Part_A, SS-DCNet even beats most existing models trained directly on the target domain. Code and models have been made available at: https://tinyurl.com/SS-DCNet.

show abstract

Section: Classification-based Counter (C-counter)mentioning

confidence: 99%

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Xiong

Liu

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

146

103

View full text Add to dashboard Cite

show abstract

“…While the these methods build techniques that are robust to scale variations, more recent methods have focused on other aspects such as progressively increasing the capacity of the network based on dataset [3], use of adversarial loss to reduce blurry effects in the predicted output maps [49,56], learning generalizable features via deep negative correlation based learning [51], leveraging unlabeled data for counting by introducing a learning to rank framework [34], cascaded feature fusion [43] and scale-based feature aggregation [7], weakly-supervised learning for crowd counting [58]. Recently, Idrees et al [19] created a new large-scale high-density crowd dataset with approximately 1.25 million head annotations and a new localization task for crowded images.…”

Section: Related Workmentioning

confidence: 99%

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

Sindagi

Patel

2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

171

View full text Add to dashboard Cite

Crowd counting presents enormous challenges in the form of large variation in scales within images and across the dataset. These issues are further exacerbated in highly congested scenes. Approaches based on straightforward fusion of multi-scale features from a deep network seem to be obvious solutions to this problem. However, these fusion approaches do not yield significant improvements in the case of crowd counting in congested scenes. This is usually due to their limited abilities in effectively combining the multi-scale features for problems like crowd counting. To overcome this, we focus on how to efficiently leverage information present in different layers of the network. Specifically, we present a network that involves: (i) a multilevel bottom-top and top-bottom fusion (MBTTBF) method to combine information from shallower to deeper layers and vice versa at multiple levels, (ii) scale complementary feature extraction blocks (SCFB) involving cross-scale residual functions to explicitly enable flow of complementary features from adjacent conv layers along the fusion paths. Furthermore, in order to increase the effectiveness of the multi-scale fusion, we employ a principled way of generating scale-aware ground-truth density maps for training. Experiments conducted on three datasets that contain highly congested scenes (ShanghaiTech, UCF CROWD 50, and UCF-QNRF) demonstrate that the proposed method is able to outperform several recent methods in all the datasets.

show abstract

“…In addition to multi-column networks, there are a lot of methods to improve scale invariance of feature learning by 1) studying on the fusion of multi-scale features [35,57,62,63], 2) studying on multiblob based scale aggregation networks [7,64], 3) designing scaleinvariant convolutional or pooling layers [21,30,33,56,62], and 4) studying on automated scale adaptive networks [48,49,66]. On the other hand, a lot of studies devote to using perspective maps [52], geometric constraints [34,68], and region-of-interest [33] to further improve the counting accuracy.…”

Section: Cnn-based Methodsmentioning

confidence: 99%

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

Cheng

Dai

et al. 2019

Proceedings of the 27th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multicolumn Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

show abstract

Divide and Grow: Capturing Huge Diversity in Crowd Images with Incrementally Growing CNN

Cited by 206 publications

References 14 publications

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

From Open Set to Closed Set: Counting Objects by Spatial Divide-and-Conquer

Multi-Level Bottom-Top and Top-Bottom Feature Fusion for Crowd Counting

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

Contact Info

Product

Resources

About