With the growth of image data being generated by surveillance cameras, automated video analysis has become necessary in order to detect unusual events. Recently, Deep Learning methods have achieved the state of the art results in many tasks related to computer vision. Among Deep Learning methods, the Autoencoder is commonly used for anomaly detection tasks. This work presents a method to classify frames of four different well known video datasets as normal or anomalous by using reconstruction errors as features for a classifier. To perform this task, Convolutional Autoencoders and One-Class SVMs were employed. Results suggest that the method is capable of detecting anomalies across the four different benchmark datasets. We also present a comparison with the state of the art approaches and data visualization.
Deep learning methods are becoming more popular for complex pattern recognition applications. As result, many frameworks have appeared aiming to facilitate the development of such applications. However, choosing a suitable framework may not be an easy task for new users. In this paper, a qualitative evaluation of four of the most popular Deep Learning frameworks is provided, including: Caffe, Torch, Lasagne and TensorFlow. A printed character recognition task was used as case study, and a Convolutional Neural Network was implemented for this purpose. The analysis focus on issues that are important for the development process and encompasses nine qualitative dimensions, showing the strengths and weaknesses of each framework. It is expected that this analysis can be useful for guiding new users in the area.
The Writer Identification Problem has been largely studied in the field of image processing. Music score writer identification is a particular type of the problem that requires identifying the writer of a music score, which is a complex task even for musicologists. Addressing this issue, this paper presents a novel Deep Learning approach based on a Convolutional Neural Network (CNN) for classifying music score images according to their writer. The classification is accomplished by dividing a music score image into patches that are fed to the CNN, which provides classification results for each patch. A voting system is then applied to obtain the final prediction of the model. This approach allows to learn local features of each music score in order to improve the final classification result. Results show that the proposed approach allows to obtain satisfactory results for the dataset used in this work, reaching 84%, 94% and 98% for the top-1, top-3 and top-5 accuracies, respectively.
This work presents a methodology to perform the classification of soft biometrics in images of pedestrians using a Denoising Convolutional Autoencoder as feature extractor and a Support Vector Machine as classifier. The Denoising Convolutional Autoencoder was trained with a custom dataset containing a combination of five available datasets (3DPES, Market1501, PRID2011, VIPeR and ETHZ) and used as a feature extractor of the images of the VIPeR dataset. The extracted features were then used as input values for a Support Vector Machine classifier, with its hyper-parameters set by using Grid Search, in order to classify the images according to two soft biometrics or labels: Long-Hair and Sunglasses. The results obtained with the proposed approach were compared to those obtained using other well-known feature extractor: Histogram of Oriented Gradients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.