2022
DOI: 10.48550/arxiv.2203.02636
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Boosting Crowd Counting via Multifaceted Attention

Abstract: This paper focuses on the challenging crowd counting task. As large-scale variations often exist within crowd images, neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can well handle this kind of variations. To address this problem, we propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding. MAN incorporates global attention from vanilla transformer, learnable local attention, and instance attention into … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 29 publications
0
3
0
Order By: Relevance
“…This is good evidence that the dataset created and used in this paper is more suitable for real aquaculture scenarios and has more similar characteristics to the high-density characteristics of fish fry in such scenarios. Considering the actual culture conditions of fish fry crowding, this study draws on the idea of crowd density estimation in crowding scenarios [27][28][29] and marker methods from other similar studies [22]. In this study, for high-density, heavily obscured fry populations, the locations of the targets and the number of fish in the image were determined by marking the head of each fish.…”
Section: A Dataset 1) Dataset Acquisition and Annotationmentioning
confidence: 99%
“…This is good evidence that the dataset created and used in this paper is more suitable for real aquaculture scenarios and has more similar characteristics to the high-density characteristics of fish fry in such scenarios. Considering the actual culture conditions of fish fry crowding, this study draws on the idea of crowd density estimation in crowding scenarios [27][28][29] and marker methods from other similar studies [22]. In this study, for high-density, heavily obscured fry populations, the locations of the targets and the number of fish in the image were determined by marking the head of each fish.…”
Section: A Dataset 1) Dataset Acquisition and Annotationmentioning
confidence: 99%
“…i) Local feature aggregation: gene expression prediction can be considered as individually aggregating and identifying the feature of each gene type for the slide image window. The long-range dependency, i.e., global context, among identified features is needed to reason about complex scenarios [6,7], as those features are generally non-uniformly distributed across the slide image (see Sec. 3 for details).…”
Section: (C)mentioning
confidence: 99%
“…Also, for the Low part, MSE surpasses the previous SOTA method by 27.58%, which is a great improvement. Unfortunately, however, in the Overall part, both metrics lag behind the MAN [46]. NWPU-Crowd is currently the most challenging dataset in the field of crowd counting.…”
Section: B Comparisons and Analysismentioning
confidence: 99%