2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA) 2019
DOI: 10.1109/ispa.2019.8868704
|View full text |Cite
|
Sign up to set email alerts
|

Transfer learning for the classification of video-recorded crowd movements

Abstract: The automatic recognition of a crowd movement captured by a CCTV camera can be of considerable help to security forces whose mission is to ensure the safety of people on the public area. In this context, we propose to fine-tune a model from the TwoStream Inflated 3D architecture, pre-trained on the ImageNet and the Kinetics source datasets, to classify video sequences of crowd movements from the Crowd-11 target dataset. The evaluation of our model demonstrates its superiority over the state-of-the-art in terms… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(8 citation statements)
references
References 31 publications
0
8
0
Order By: Relevance
“…In a previous work on the Crowd-11 dataset, we showed that models from the 2S-I3D network perform better than the C3D network and the Inflated 3D Nets (I3D) [6]. However, the 2S-I3D results peak approximately at 68% accuracy.…”
Section: A Creation Of Homogeneous Models Ensemblesmentioning
confidence: 84%
See 3 more Smart Citations
“…In a previous work on the Crowd-11 dataset, we showed that models from the 2S-I3D network perform better than the C3D network and the Inflated 3D Nets (I3D) [6]. However, the 2S-I3D results peak approximately at 68% accuracy.…”
Section: A Creation Of Homogeneous Models Ensemblesmentioning
confidence: 84%
“…The model that obtains the best classification results in their article derives from the C3D architecture [5]. In previous works, we obtained better results [6], by using a model derived from the TwoStream Inflated 3D architecture (2S-I3D) which already outperforms the C3D models on action recognition datasets [3].…”
Section: Introductionmentioning
confidence: 94%
See 2 more Smart Citations
“…6, for its good performance and efficiency, the popular Convolutional 3D Networks (C3D) [43] is selected as our pre-trained feature extractor. Recent studies [44][45][46][47] have shown that fine tuning of a more complex dataset results in excellent classification and detection performance using a pre-trained Sports-1 M dataset model [48]. The reason for this training procedure is that the 3D CNN receives general representation of video clips from pre-training.…”
Section: Feature Extraction Through the Pre-trained C3d Modelmentioning
confidence: 99%