Reinforcing Local Feature Representation for Weakly-Supervised Dense Crowd Counting

Chen, Xiaoshuang; Lu, Hongtao

doi:10.48550/arxiv.2202.10681

Cited by 1 publication

(1 citation statement)

References 26 publications

(62 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It constructs a weakly supervised model from sequence-to-count perspective. SFSL [5] introduces a learnable unbiased feature estimation of persons and utilizes the feature similarity for the regression of crowd numbers to solve the lack of local supervision. CrowdMLP [37] proposes a multi-granularity multilayer perceptron (MLP) regressor to enlarge receptive fields and a split-counting to decouple spatial constraints.…”

Section: Transformer Based Crowd Countingmentioning

confidence: 99%

RGB-T Multi-Modal Crowd Counting Based on Transformer

Liu¹,

Wu²,

Tan³

et al. 2023

Preprint

View full text Add to dashboard Cite

Crowd counting aims to estimate the number of persons in a scene. Most state-of-theart crowd counting methods based on color images can't work well in poor illumination conditions due to invisible objects. With the widespread use of infrared cameras, crowd counting based on color and thermal images is studied. Existing methods only achieve multi-modal fusion without count objective constraint. To better excavate multi-modal information, we use count-guided multi-modal fusion and modal-guided count enhancement to achieve the impressive performance. The proposed count-guided multi-modal fusion module utilizes a multi-scale token transformer to interact two-modal information under the guidance of count information and perceive different scales from the token perspective. The proposed modal-guided count enhancement module employs multi-scale deformable transformer decoder structure to enhance one modality feature and count information by the other modality. Experiment in public RGBT-CC dataset shows that our method refreshes the state-of-the-art results. https://github.com/liuzywen/RGBTCC IntroductionCrowd counting can predict the distribution of crowd and estimate the number of persons in unconstraint scenes. It is widely studied by the academia and industrial communities since the number of persons is an important indicator of incident monitoring[31], traffic control [19], and infectious disease prevention [32]. The existing crowd counting methods have achieved tremendous improvement due to the introduce of convolutional neural networks [7,8] and transformer [28,40].However, when light is insufficient, the performance of crowd counting is unsatisfying, as shown in the first line of Fig. 1. The thermal image can percept the temperature of objects to recognize the persons. Therefore, RGB-Thermal (RGB-T) crowd counting by introducing the thermal modality has attracted a lot of attentions.

show abstract

Section: Transformer Based Crowd Countingmentioning

confidence: 99%