Transfer learning for the classification of video-recorded crowd movements

Bendali-Braham, Mounir; Weber, Jonathan; Forestier, Germain; Idoumghar, Lhassane; Müller, Pierre-Alain

doi:10.1109/ispa.2019.8868704

Cited by 7 publications

(8 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In a previous work on the Crowd-11 dataset, we showed that models from the 2S-I3D network perform better than the C3D network and the Inflated 3D Nets (I3D) [6]. However, the 2S-I3D results peak approximately at 68% accuracy.…”

Section: A Creation Of Homogeneous Models Ensemblesmentioning

confidence: 84%

“…The model that obtains the best classification results in their article derives from the C3D architecture [5]. In previous works, we obtained better results [6], by using a model derived from the TwoStream Inflated 3D architecture (2S-I3D) which already outperforms the C3D models on action recognition datasets [3].…”

Section: Introductionmentioning

confidence: 94%

“…We opt for a compromise between a form of Stacking [25], without a meta-classifier because we combine the models at the evaluation phase, and a form of Bagging, because we perform an aggregation of models without applying Bootstrap sampling. Here the samples are the folds already obtained following the cross validation that we did for our previous work [6]. Our split is stratified, which means that each fold maintains the classes distribution of the original dataset.…”

Section: Reviewmentioning

confidence: 99%

“…Choosing a small number of models that are already yielding good results is enough to yield better results when they end up gathering into an Ensemble. Therefore, we decided to split the dataset into 5 folds, as it was already done in Bendali-Braham et al [6], which allows each Ensemble to be equipped with 4 single models that extract different knowledge from the Crowd-11 dataset.…”

Section: Reviewmentioning

confidence: 99%

See 3 more Smart Citations

Ensemble classification of video-recorded crowd movements

Bendali-Braham

Weber

Forestier

et al. 2021

2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)

Self Cite

View full text Add to dashboard Cite

Ensemble learning methods often improve results in problems addressed by single Machine Learning models. In this work, we apply Ensemble Learning on video-recorded crowd movements. First, we build Ensembles of homogeneous Convolutional Neural Networks (CNN) to compare their performance on the Crowd-11 dataset and show the gain of performance demonstrated by Ensembles compared to single CNN models. Secondly, we evaluate all the possible combinations of these homogeneous Ensembles to build a global Ensemble of heterogeneous models, and we analyze the combination of Ensembles that achieves the best results. Our experiments reveal that Ensemble classification often obtains better results than single models and combining different Ensembles can make the predictions accuracy even better.

show abstract

Section: A Creation Of Homogeneous Models Ensemblesmentioning

confidence: 84%

Section: Introductionmentioning

confidence: 94%

Section: Reviewmentioning

confidence: 99%

Section: Reviewmentioning

confidence: 99%

See 2 more Smart Citations

Ensemble classification of video-recorded crowd movements

Bendali-Braham

Weber

Forestier

et al. 2021

2021 12th International Symposium on Image and Signal Processing and Analysis (ISPA)

Self Cite

View full text Add to dashboard Cite

show abstract

“…6, for its good performance and efficiency, the popular Convolutional 3D Networks (C3D) [43] is selected as our pre-trained feature extractor. Recent studies [44][45][46][47] have shown that fine tuning of a more complex dataset results in excellent classification and detection performance using a pre-trained Sports-1 M dataset model [48]. The reason for this training procedure is that the 3D CNN receives general representation of video clips from pre-training.…”

Section: Feature Extraction Through the Pre-trained C3d Modelmentioning

confidence: 99%

Deep anomaly detection through visual attention in surveillance videos

et al. 2020

View full text Add to dashboard Cite

This paper describes a method for learning anomaly behavior in the video by finding an attention region from spatiotemporal information, in contrast to the full-frame learning. In our proposed method, a robust background subtraction (BG) for extracting motion, indicating the location of attention regions is employed. The resulting regions are finally fed into a three-dimensional Convolutional Neural Network (3D CNN). Specifically, by taking advantage of C3D (Convolution 3-dimensional), to completely exploit spatiotemporal relation, a deep convolution network is developed to distinguish normal and anomalous events. Our system is trained and tested against a large-scale UCF-Crime anomaly dataset for validating its effectiveness. This dataset contains 1900 long and untrimmed real-world surveillance videos and splits into 950 anomaly events and 950 normal events, respectively. In total, there are approximately ~ 13 million frames are learned during the training and testing phase. As shown in the experiments section, in terms of accuracy, the proposed visual attention model can obtain 99.25 accuracies. From the industrial application point of view, the extraction of this attention region can assist the security officer on focusing on the corresponding anomaly region, instead of a wider, full-framed inspection.

show abstract