Cascaded Multi-Task Learning of Head Segmentation and Density Regression for RGBD Crowd Counting

Zhou, Desen; He, Qian

doi:10.1109/access.2020.2998678

Cited by 8 publications

(2 citation statements)

References 50 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Nevertheless, capturing the complementarities of multimodal data (i.e., RGB and thermal images) is non-trivial. Conventional methods [21,65,37,15,53,45] either feed the combination of multimodal data into deep neural networks or directly fuse their features, which could not well exploit the complementary information. In this work, to facilitate the multimodal crowd counting, we introduce a cross-modal collaborative representation learning framework, which incorporates multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to fully capture the complementarities among different modalities.…”

Section: Introductionmentioning

confidence: 99%

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Liu

Chen

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

100

View full text Add to dashboard Cite

Crowd counting is a fundamental yet challenging problem, which desires rich information to generate pixel-wise crowd density maps. However, most previous methods only utilized the limited information of RGB images and may fail to discover the potential pedestrians in unconstrained environments. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modalityspecific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to fully capture the complementary information of different modalities. Specifically, our IADM incorporates two collaborative information transfer components to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting. Moreover, the proposed approach is universal for multimodal crowd counting and is also capable to achieve superior performance on the ShanghaiTechRGBD [21] dataset.

show abstract

Section: Introductionmentioning

confidence: 99%

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Liu

Chen

et al. 2021

2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

100

View full text Add to dashboard Cite

show abstract

“…Existing methods of population counting include detection-based and regression-based methods [15][16][17][18]. Detection-based methods can count population in sparse scenes, but they are limited in crowded scenes.…”

Section: Introductionmentioning

confidence: 99%

Research on 24-Hour Dense Crowd Counting and Object Detection System Based on Multimodal Image Optimization Feature Fusion

Ren

2022

Scientific Programming

View full text Add to dashboard Cite

Motivation. In the environment of day and night video surveillance, in order to improve the accuracy of machine vision dense crowd counting and target detection, this paper designs a day and night dual-purpose crowd counting and crowd detection network based on multimode image fusion. Methods. Two sub-models, RGBD-Net and RGBT-Net, are designed in this paper. The depth image features and thermal imaging features are effectively fused with the features of visible light images, so that the model has stronger anti-interference characteristics and robustness to the light noise interference caused by the sudden fall of light at night. The above models use density map regression-guided detection method to complete population counting and detection. Results. The model completed daytime training and testing on MICC dataset. Through verification, the average absolute error of the model was 1.025, the mean square error was 1.521, and the recall rate of target detection was 97.11%. Night vision training and testing were completed on the RGBT-CC dataset. After verification, the average absolute error of the network was 18.16, the mean square error was 32.14, and the recall rate of target detection was 97.65%. By verifying the effectiveness of the multimode medium-term fusion network, it is found to exceed the current most advanced bimodal fusion method. Conclusion. The experimental results show that the proposed multimodal fusion network can solve the counting and detection problem in the video surveillance environment during day and night. The ablation experiment further proves the effectiveness of the parameters of the two models.

show abstract

Cross-modal collaborative representation and multi-level supervision for crowd counting

Hu²,

Zhao³

et al. 2022

SIViP

View full text Add to dashboard Cite

Cascaded Multi-Task Learning of Head Segmentation and Density Regression for RGBD Crowd Counting

Cited by 8 publications

References 50 publications

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Research on 24-Hour Dense Crowd Counting and Object Detection System Based on Multimodal Image Optimization Feature Fusion

Cross-modal collaborative representation and multi-level supervision for crowd counting

Contact Info

Product

Resources

About