2022
DOI: 10.1155/2022/3454167
|View full text |Cite
|
Sign up to set email alerts
|

Vision Transformer and Deep Sequence Learning for Human Activity Recognition in Surveillance Videos

Abstract: Human Activity Recognition is an active research area with several Convolutional Neural Network (CNN) based features extraction and classification methods employed for surveillance and other applications. However, accurate identification of HAR from a sequence of frames is a challenging task due to cluttered background, different viewpoints, low resolution, and partial occlusion. Current CNN-based techniques use large-scale computational classifiers along with convolutional operators having local receptive fie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
29
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 41 publications
(29 citation statements)
references
References 43 publications
0
29
0
Order By: Relevance
“…Video classification [94] is an interesting domain with main focus on effective temporal contents representation that significantly contributes to precise video label prediction. Although the early approaches in video classification are based on simple CNNs [39], but the recent methods employ various temporal [8] and spatio-temporal [34] strategies for video classification. Video classification is further divided into several major domains such as activity recognition, anomaly detection and recognition, and violence detection and recognition.…”
Section: Deep Learning For Vdmentioning
confidence: 99%
“…Video classification [94] is an interesting domain with main focus on effective temporal contents representation that significantly contributes to precise video label prediction. Although the early approaches in video classification are based on simple CNNs [39], but the recent methods employ various temporal [8] and spatio-temporal [34] strategies for video classification. Video classification is further divided into several major domains such as activity recognition, anomaly detection and recognition, and violence detection and recognition.…”
Section: Deep Learning For Vdmentioning
confidence: 99%
“…The captioning system developed by [36], [37], [38], [39], [40], [41], [42] demonstrated the employment of visual, local, global, adaptive, spatial, temporal, and channel attention for coherent and diverse caption generation. [44], [45], [46], [47] long term dependency handling is not an issue anymore for researchers engaged in video processing for summarization and description, or for autonomous-vehicle, surveillance, and instructional purposes.…”
Section: ) Encoder-decoder (Ed) Based Approachesmentioning
confidence: 99%
“…e task of nonlinear mapping and feature extraction is extremely challenging; therefore, the best way to tackle these challenges is to employ deep learning models with the ability to extract the discriminative features end-toend [29,30]. In recent years, the application of deep learning models has significantly improved for image classification [31,32], video classification [33][34][35][36][37], and power forecasting in TS data [38][39][40][41][42]. For instance, Khan et al [43] proposed a hybrid model for electricity forecasting in residential and commercial buildings.…”
Section: Introductionmentioning
confidence: 99%