With the growth of image data being generated by surveillance cameras, automated video analysis has become necessary in order to detect unusual events. Recently, Deep Learning methods have achieved the state of the art results in many tasks related to computer vision. Among Deep Learning methods, the Autoencoder is commonly used for anomaly detection tasks. This work presents a method to classify frames of four different well known video datasets as normal or anomalous by using reconstruction errors as features for a classifier. To perform this task, Convolutional Autoencoders and One-Class SVMs were employed. Results suggest that the method is capable of detecting anomalies across the four different benchmark datasets. We also present a comparison with the state of the art approaches and data visualization.
In One-Class Classification (OCC) problems, the classifier is trained with samples of a class considered normal, such that exceptional patterns can be identified as anomalies. Indeed, for real-world problems, the representation of the normal class in the feature space is an important issue, considering that one or more clusters can describe different aspects of the normality. For classification purposes, it is important that these clusters be as compact (dense) as possible, for better discriminating anomalous patterns, which is a recurrent problem in OCC tasks. This work introduces a hybrid approach using deep learning and One-Class Support Vector Machine (OC-SVM) methods, named Convolutional Autoencoder with Compact Embedding (CAE-CE), for enhancing the compactness of clusters in the feature space. Such an approach is still underexplored in the literature, being restricted to models within the context of metric learning. Additionally, the absence of anomalous samples during training makes it difficult to determine when to interrupt the learning process, so as to avoid over-compression of the normal examples, thus resulting in overfitting of the model. In this work, we propose a novel sensitivity-based stop criterion, and its suitability for OCC problems was assessed. Using an OC-SVM for the classification task, several experiments were done using publicly available image and video datasets. We also introduce other two new benchmarks, specifically designed for video anomaly detection in highways. The final performance of the proposed method was compared with a baseline Convolutional Autoencoder (CAE). Overall results suggest that the enhanced compactness introduced by the CAE-CE improved the classification performance for most datasets. Also, the qualitative analysis of frames at the visual level indicated that features learned by CAE-CE are closely correlated to the anomalous events.
Soft biometrics classification has been gaining acceptance during the recent years for critical applications, mainly in the security field. Recognizing individuals by using only behavioral, physical or psychological characteristics is a task that can be helpful for several purposes. Thus, different Deep Learning approaches have been proposed to perform this task. Since these methods require a large amount of data to avoid overfitting, data augmentation is a commonly used method. However, its isolated effect on the performance of the models are usually not evaluated. This work aims at studying the effect of different data augmentation strategies on the performances of two Convolutional Neural Network architectures for classifying soft biometrics attributes from samples of a novel dataset: LABICv1.Recently, several works appeared, aiming at solving this problem through different strategies, usually Deep Learning (DL) [2] approaches such as Convolutional Neural Networks (CNNs) [3]. For instance, Perlin and Lopes [4] presented two CNNs with the same architecture but with different operation modes: one for classifying three soft biometrics (Upper Clothes, Lower Clothes and Gender) at once and the other for classifying a single soft biometric. The first was trained using the negative log-likelihood as loss function and the second with the mean squared error. In the work presented by Levi and Hassncer [5], age and gender of individuals were classified from images of human faces using a deep CNN, whilst Wang et al. [6] presented an approach based on a 6-layer architecture CNN for feature extraction, in order to estimate age from images containing faces. The works by Zhu et al. [7] and Martinho-Corbishley et al. [8] presented architectures based on slicing images of individuals and feeding each sliced window to different input layers of the network. Each input is propagated through separate Convolution and Pooling layers until reaching the Fully Connected layer, where the outputs of each layer are combined to form an unique flattened vector. Both approaches allow to perform multi-label classification. All works used variations of the Stochastic Gradient Descent (SGD) method [9].Since DL models require a large amount of data to obtain satisfactory results and soft biometrics datasets are usually small, the use of data augmentation is common to reduce overfitting and improve the classification performance [10,11]. This method is based on generating new samples of the original dataset by applying small random transformations to the original samples, whilst preserving their labels. Works related to soft biometrics classification are often focused on the architecture of the model without evaluating the particular isolated effect of data augmentation.This work presents a study regarding the effect of data augmentation on the performance of CNN architectures with different complexities for a small dataset (which can also be unbalanced depending on the attribute to classify). For this purpose, we present a new labeled dataset f...
Deep learning methods are becoming more popular for complex pattern recognition applications. As result, many frameworks have appeared aiming to facilitate the development of such applications. However, choosing a suitable framework may not be an easy task for new users. In this paper, a qualitative evaluation of four of the most popular Deep Learning frameworks is provided, including: Caffe, Torch, Lasagne and TensorFlow. A printed character recognition task was used as case study, and a Convolutional Neural Network was implemented for this purpose. The analysis focus on issues that are important for the development process and encompasses nine qualitative dimensions, showing the strengths and weaknesses of each framework. It is expected that this analysis can be useful for guiding new users in the area.
Recent research has shown that features obtained from pretrained Convolutional Neural Network (CNN) models can be promptly applied to a variety of problems they were not originally designed to solve. This concept, often referred to as Transfer Learning (TL), is a common practice when labeled data is limited. In some fields, such as video anomaly detection, TL is still an underexplored subject in the sense that it is not clear whether the architecture of the pretrained CNN model impacts on the video anomaly detection performance. In order to clarify this issue, we perform an extensive benchmark using 12 different pretrained CNN models on ImageNet as feature extractors and apply the features obtained to seven video anomaly detection benchmark datasets. This work presents some interesting findings about video anomaly detection using TL. The highlights of our findings were revealed by our experiments, which have shown that a simple classification process using One-Class Support Vector Machines yields similar results to state-of-the-art models. Moreover, a statistical analysis suggests that architectural differences are negligible when choosing a pretrained model for video anomaly detection, since all models presented similar performance. At last, we present an in-depth visual analysis of the Avenue dataset, and reveal several aspects that may be limiting the performance of state-of-the-art video anomaly detection methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.