Audio-Visual Transformer Based Crowd Counting

Sajid, Usman; Chen, Xiangyu; Sajid, Hasan; Kim, Taejoon; Wang, Guanghui

doi:10.48550/arxiv.2109.01926

Cited by 2 publications

(2 citation statements)

References 55 publications

(85 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Transformer architecture for vision tasks has recently been presented. Visual transformer (ViT) (Han et al, 2020 ; Sajid et al, 2021 ; Truong et al, 2021 ) establish the possibility of pure transformer architectures for computer vision tasks as a pioneering study. Transformer blocks are utilized as standalone architectures or presented into CNNs for semantic segmentation, image classification, image generation, image enhancement, and object detection to manipulate long-range dependencies.…”

Section: Related Studymentioning

confidence: 99%

Automatic Plant Disease Detection Based on Tranvolution Detection Network With GAN Modules Using Leaf Images

Zhang

et al. 2022

Front. Plant Sci.

View full text Add to dashboard Cite

The detection of plant disease is of vital importance in practical agricultural production. It scrutinizes the plant's growth and health condition and guarantees the regular operation and harvest of the agricultural planting to proceed successfully. In recent decades, the maturation of computer vision technology has provided more possibilities for implementing plant disease detection. Nonetheless, detecting plant diseases is typically hindered by factors such as variations in the illuminance and weather when capturing images and the number of leaves or organs containing diseases in one image. Meanwhile, traditional deep learning-based algorithms attain multiple deficiencies in the area of this research: (1) Training models necessitate a significant investment in hardware and a large amount of data. (2) Due to their slow inference speed, models are tough to acclimate to practical production. (3) Models are unable to generalize well enough. Provided these impediments, this study suggested a Tranvolution detection network with GAN modules for plant disease detection. Foremost, a generative model was added ahead of the backbone, and GAN models were added to the attention extraction module to construct GAN modules. Afterward, the Transformer was modified and incorporated with the CNN, and then we suggested the Tranvolution architecture. Eventually, we validated the performance of different generative models' combinations. Experimental outcomes demonstrated that the proposed method satisfyingly achieved 51.7% (Precision), 48.1% (Recall), and 50.3% (mAP), respectively. Furthermore, the SAGAN model was the best in the attention extraction module, while WGAN performed best in image augmentation. Additionally, we deployed the proposed model on Hbird E203 and devised an intelligent agricultural robot to put the model into practical agricultural use.

show abstract

Section: Related Studymentioning

confidence: 99%

Automatic Plant Disease Detection Based on Tranvolution Detection Network With GAN Modules Using Leaf Images

Zhang

et al. 2022

Front. Plant Sci.

View full text Add to dashboard Cite

show abstract

“…Crowd counting aims to estimate the total number of people in a given static image. This is a very challenging problem in practice since there exists a significant difference in the crowd number in and across different images, varying images resolution, large perspective, and severe occlusions [4], as shown in Fig. 1.…”

Section: Introductionmentioning

confidence: 99%

Towards More Effective PRM-based Crowd Counting via A Multi-resolution Fusion and Attention Network

Sajid¹,

Wang²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The paper focuses on improving the recent plug-and-play patch rescaling module (PRM) based approaches for crowd counting. In order to make full use of the PRM potential and obtain more reliable and accurate results for challenging images with crowd-variation, large perspective, extreme occlusions, and cluttered background regions, we propose a new PRM based multi-resolution and multi-task crowd counting network by exploiting the PRM module with more effectiveness and potency. The proposed model consists of three deeplayered branches with each branch generating feature maps of different resolutions. These branches perform a feature-level fusion across each other to build the vital collective knowledge to be used for the final crowd estimate. Additionally, early-stage feature maps undergo visual attention to strengthen the later-stage channel's understanding of the foreground regions. The integration of these deep branches with the PRM module and the early-attended blocks proves to be more effective than the original PRM based schemes through extensive numerical and visual evaluations on four benchmark datasets. The proposed approach yields a significant improvement by a margin of 12.6% in terms of the RMSE evaluation criterion. It also outperforms state-of-the-art methods in cross-dataset evaluations.

show abstract

Audio-Visual Transformer Based Crowd Counting

Cited by 2 publications

References 55 publications

Automatic Plant Disease Detection Based on Tranvolution Detection Network With GAN Modules Using Leaf Images

Automatic Plant Disease Detection Based on Tranvolution Detection Network With GAN Modules Using Leaf Images

Towards More Effective PRM-based Crowd Counting via A Multi-resolution Fusion and Attention Network

Contact Info

Product

Resources

About