International audienceThe PASCAL Visual Object Classes Challenge ran from February to March 2005. The goal of the challenge was to recognize objects from a number of visual object classes in realistic scenes (i.e. not pre-segmented objects). Four object classes were selected: motorbikes, bicycles, cars and people. Twelve teams entered the challenge. In this chapter we provide details of the datasets, algorithms used by the teams, evaluation criteria, and results achieved
Development of content-based image retrieval (CBIR) techniques has suffered from the lack of standardized ways for describing visual image content. Luckily, the MPEG-7 international standard is now emerging as both a general framework for content description and a collection of specific agreed-upon content descriptors. We have developed a neural, self-organizing technique for CBIR. Our system is named PicSOM and it is based on pictorial examples and relevance feedback (RF). The name stems from "picture" and the self-organizing map (SOM). The PicSOM system is implemented by using tree structured SOMs. In this paper, we apply the visual content descriptors provided by MPEG-7 in the PicSOM system and compare our own image indexing technique with a reference system based on vector quantization (VQ). The results of our experiments show that the MPEG-7-defined content descriptors can be used as such in the PicSOM system even though Euclidean distance calculation, inherently used in the PicSOM system, is not optimal for all of them. Also, the results indicate that the PicSOM technique is a bit slower than the reference system in starting to find relevant images. However, when the strong RF mechanism of PicSOM begins to function, its retrieval precision exceeds that of the reference system.
Digital image libraries are becoming more common and widely used as more visual information is produced at a rapidly growing rate. Content-based image retrieval is an important approach to the problem of processing this increasing amount of data It is based on automatically extracted features from the content of the images, such as color, texture, shape, and structure. We have started a project to study methods for content-based image remkval using the Self-organizing Map (SOM) as the image similarity scoring method Our image rem*evd system, named Pic-SOM. can be seen as a SOM-based approach to relevance feedback which is a form of supervised learning to adjust the subsequent queries based on the user's responses during the information rem-eval session. In PicSOM, a separate Tree Structured SOM (TS-SOM) is trained for each feature vector type in use. The system then adapts to the user's preferences by returning her more images from those SOMs where her responses have been most densely mapped.
Self-Organising Maps (SOMs) can be used in implementing a powerful relevance feedback mechanism for Content-Based Image Retrieval (CBIR). This paper introduces the PicSOM CBIR system, and describes the use of SOMs as a relevance feedback technique in it. The technique is based on the SOM's inherent property of topology-preserving mapping from a high-dimensional feature space to a two-dimensional grid of artificial neurons. On this grid similar images are mapped in nearby locations. As image similarity must, in unannotated databases, be based on low-level visual features, the similarity of images is dependent on the feature extraction scheme used. Therefore, in PicSOM there exists a separate tree-structured SOM for each different feature type. The incorporation of the relevance feedback and the combination of the outputs from the SOMs are performed as two successive processing steps. The proposed relevance feedback technique is described, analysed qualitatively, and visualised in the paper. Also, its performance is compared with a reference method.
Gesture recognition is needed in many applications such as human-computer interaction and sign language recognition. The challenges of building an actual recognition system do not lie only in reaching an acceptable recognition accuracy but also with requirements for fast online processing. In this paper, we propose a method for online gesture recognition using RGB-D data from a Kinect sensor. Frame-level features are extracted from RGB frames and the skeletal model obtained from the depth data, and then classified by multiple extreme learning machines. The outputs from the classifiers are aggregated to provide the final classification results for the gestures. We test our method on the ChaLearn multi-modal gesture challenge data. The results of the experiments demonstrate that the method can perform effective multi-class gesture recognition in real-time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.