A cross-modal fusion based approach with scale-aware deep representation for RGB-D crowd counting and density estimation

Zhang, Shihui; Li, He; Kong, Weihang

doi:10.1016/j.eswa.2021.115071

Cited by 18 publications

(7 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea of this group's methods, such as [39,61,64,67] in its most simplistic form is to obtain a density map from an image and then integrate it in order to get the estimation of people in the image. Contrary to the previous approaches, these also consider the spatial information.…”

Section: Density Based Approachesmentioning

confidence: 99%

“…In recent surveys [48,11] authors classify CNN-based approaches into four categories, based on the property of the networks: Basic CNNs include networks with basic CNN layers and represent initial deep learning approaches for crowd counting [14,35,56,58,67], scale-aware models that leverage multi-column or multi-resolution architectures to achieve scale robustness [3,25,37,68], context-aware models that incorporate global and local contextual information to improve performance [45,46], and multi-task frameworks that combine crowd counting with tasks such as crowd velocity estimation, etc. [2,47,66,70] Based on the inference methodology, they also classify them into patch-based, where models are trained using patches from the image and the inference is done using sliding window approach [2,3,14,25,35,37,56,58,66,70], and whole image-based [45,46,47,68,60].…”

Section: Density Based Approachesmentioning

confidence: 99%

See 1 more Smart Citation

Re-evaluation of the CNN-based state-of-the-art crowd-counting methods with enhancements

Tersek¹,

Kljun²,

Peer³

et al. 2022

ComSIS

View full text Add to dashboard Cite

Crowd counting has a range of applications and it is an important task that can help with the accident prevention such as crowd crushes and stampedes in political protests, concerts, sports, and other social events. Many crown counting approaches have been proposed in the recent years. In this paper we compare five deep-learning-based approaches to crowd counting, reevaluate them and present a novel CSRNet-based approach. We base our implementation on five convolutional neural network (CNN) architectures: CSRNet, Bayesian Crowd Counting, DM Count, SFA-Net, and SGA-Net and present a novel approach by upgrading CSRNet with application of a Bayesian crowd counting loss function and pixel modeling. The models are trained and evaluated on three widely used crowd image datasets, ShanghaiTech part A, part B, and UCF-QNRF. The results show that models based on SFA-Net and DM-Count outperform state-of-the-art when trained and evaluated on the similar data, and the proposed extended model outperforms the base model with the same backbone when trained and evaluated on the significantly different data, suggesting improved robustness levels.

show abstract

Section: Density Based Approachesmentioning

confidence: 99%

Section: Density Based Approachesmentioning

confidence: 99%

Re-evaluation of the CNN-based state-of-the-art crowd-counting methods with enhancements

Tersek¹,

Kljun²,

Peer³

et al. 2022

ComSIS

View full text Add to dashboard Cite

show abstract

“…Since there are few RGBD crowd counting datasets currently, there is not much work to complete crowd counting based on RGBD images [54,55]. In these studies, depth information usually provides prior knowledge of head position for RGB image segmentation.…”

Section: Related Workmentioning

confidence: 99%

Research on Local Counting and Object Detection of Multiscale Crowds in Video Based on Time-Frequency Analysis

Ren

2022

Journal of Sensors

View full text Add to dashboard Cite

Objective. It has become a very difficult task for cameras to complete real-time crowd counting under congestion conditions. Methods. This paper proposes a DRC-ConvLSTM network, which combines a depth-aware model and depth-adaptive Gaussian kernel to extract the spatial-temporal features and depth-level matching of crowd depth space edge constraints in videos, and finally achieves satisfactory crowd density estimation results. The model is trained with weak supervision on a training set of point-labeled images. The design of the detector is to propose a deep adaptive perception network DRD-NET, which can better initialize the size and position of the head detection frame in the image with the help of density map and RGBD-adaptive perception network. Results. The results show that our method achieves the best performance in RGBD dense video crowd counting on five labeled sequence datasets; the MICC dataset, CrowdFlow dataset, FDST dataset, Mall dataset, and UCSD dataset were evaluated to verify its effectiveness. Conclusion. The experimental results show that the proposed DRD-NET model combined with DRC-ConvLSTM outperforms the existing video crowd counting ConvLSTM model, and the effectiveness of the parameters of each part of the model is further proved by ablation experiments.

show abstract

“…There are currently methods that utilize adaptive Gaussian kernels to generate high-quality density maps [ 18 , 19 ]. High-quality density maps train more robust regression networks, providing prior knowledge for crowd detection that is closer to the actual distribution of crowds [ 20 ]. One of the reasons that previous detection methods cannot detect small heads is due to the lack of scale perceptron or the limitation of its own structure.…”

Section: Introductionmentioning

confidence: 99%

Enhancement of Local Crowd Location and Count: Multiscale Counting Guided by Head RGB-Mask

Ren

Wang

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

Background. In crowded crowd images, traditional detection models often have the problems of inaccurate multiscale target count and low recall rate. Methods. In order to solve the above two problems, this paper proposes an MLP-CNN model, which combined with FPN feature pyramid can fuse the feature map of low-resolution and high-resolution semantic information with less computation and can effectively solve the problem of inaccurate head count of multiscale people. MLP-CNN “mid-term” fusion model can effectively fuse the features of RGB head image and RGB-Mask image. With the help of head RGB-Mask annotation and adaptive Gaussian kernel regression, the enhanced density map can be generated, which can effectively solve the problem of low recall of head detection. Results. MLP-CNN model was applied in ShanghaiTech and UCF_ CC_ 50 and UCF-QNRF. The test results show that the error of the method proposed in this paper has been significantly improved, and the recall rate can reach 79.91%. Conclusion. MLP-CNN model not only improves the accuracy of population counting in density map regression, but also improves the detection rate of multiscale population head targets.

show abstract

A cross-modal fusion based approach with scale-aware deep representation for RGB-D crowd counting and density estimation

Cited by 18 publications

References 17 publications

Re-evaluation of the CNN-based state-of-the-art crowd-counting methods with enhancements

Re-evaluation of the CNN-based state-of-the-art crowd-counting methods with enhancements

Research on Local Counting and Object Detection of Multiscale Crowds in Video Based on Time-Frequency Analysis

Enhancement of Local Crowd Location and Count: Multiscale Counting Guided by Head RGB-Mask

Contact Info

Product

Resources

About