Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the informativeness of the integral regions and their spatial structures to capture the semantic correlation among them. Our approach is simple yet extremely effective and can be easily applied on top of a standard classification backbone network. We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets. Our method significantly outperforms the SotA approaches on six datasets and is very competitive with the remaining two.
This paper presents a novel keypoints-based attention mechanism for visual recognition in still images. Deep Convolutional Neural Networks (CNNs) for recognizing images with distinctive classes have shown great success, but their performance in discriminating fine-grained changes is not at the same level. We address this by proposing an end-to-end CNN model, which learns meaningful features linking fine-grained changes using our novel attention mechanism. It captures the spatial structures in images by identifying semantic regions (SRs) and their spatial distributions, and is proved to be the key to modelling subtle changes in images. We automatically identify these SRs by grouping the detected keypoints in a given image. The "usefulness" of these SRs for image recognition is measured using our innovative attentional mechanism focusing on parts of the image that are most relevant to a given task. This framework applies to traditional and fine-grained image recognition tasks and does not require manually annotated regions (e.g. boundingbox of body parts, objects, etc.) for learning and prediction. Moreover, the proposed keypoints-driven attention mechanism can be easily integrated into the existing CNN models. The framework is evaluated on six diverse benchmark datasets. The model outperforms the state-of-the-art approaches by a considerable margin using Distracted Driver V1 (Acc: 3.39%), Distracted Driver V2 (Acc: 6.58%), Stanford-40 Actions (mAP: 2.15%), People Playing Musical Instruments (mAP: 16.05%), Food-101 (Acc: 6.30%) and Caltech-256 (Acc: 2.59%) datasets.
This paper presents a new technique for user identi¯cation and recognition based on the fusion of hand geometric features of both hands without any pose restrictions. All the features are extracted from normalized left and right hand images. Fusion is applied at feature and also at decision level. Two probability-based algorithms are proposed for classi¯cation. The¯rst algorithm computes the maximum probability for nearest three neighbors. The second algorithm determines the maximum probability of the number of matched features with respect to a thresholding on distances. Based on these two highest probabilities initial decisions are made. The¯nal decision is considered according to the highest probability as calculated by the Dempster-Shafer theory of evidence. Depending on the various combinations of the initial decisions, three schemes are experimented with 201 subjects for identi¯cation and veri¯cation. The correct identi¯cation rate is found to be 99.5%, and the false acceptance rate (FAR) of 0.625% has been found during veri¯cation. Int. J. Patt. Recogn. Artif. Intell. 2015.29. Downloaded from www.worldscientific.com by UNIVERSITY OF OTAGO on 07/12/15. For personal use only.knowledge-based (password) or token-based (PIN) user veri¯cation systems in various required levels (low to high level) of security intelligence. 9 It is applied as one of the best reliable and legitimate human authentication systems in a constrained environment. The primary objective is to discriminate the identity of a person based on various unique biometric properties recognized by an automated system. These systems are developed by the physical (e.g. face,¯ngerprint, hand geometry, hand vein, etc.) and behavioral (e.g. gait, signature, voice, etc.) distinctiveness of an individual. 10 Di®erent human organs are employed individually (unimodal) or combined (multimodal) for this purpose. A standalone recognition decision by a unimodal system is not always reliable and robust enough to verify whether a person is genuine or imposter. So to enhance the performance, fusion 17 is very useful to authenticate an individual with multibiometrics. It basically combines several decisions taken by various expert systems. Multibiometric systems provide certain bene¯ts over the limitations of unimodal systems such as noise-e®ect, intra-class variations, non-universality, spoof-attack and°exibility. 9 Di®erent multibiometric systems are accessible and are classi¯ed according to their basic properties (e.g. multi-sensor) and level(s) of implementation (e.g. decision level). An enormous number of biometric systems with di®erent characteristics and functionalities are running successfully worldwide for several decades in the government (National ID cards), forensics (criminal investigations) and commercial applications (smart cards). 19
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.