In this paper we present the results of the Unconstrained Ear Recognition Challenge (UERC), a group benchmarking effort centered around the problem of person recognition from ear images captured in uncontrolled conditions. The goal of the challenge was to assess the performance of existing ear recognition techniques on a challenging large-scale dataset and identify open problems that need to be addressed in the future. Five groups from three continents participated in the challenge and contributed six ear recognition techniques for the evaluation, while multiple baselines were made available for the challenge by the UERC organizers. A comprehensive analysis was conducted with all participating approaches addressing essential research questions pertaining to the sensitivity of the technology to head rotation, flipping, gallery size, large-scale recognition and others. The top performer of the UERC was found to ensure robust performance on a smaller part of the dataset (with 180 subjects) regardless of image characteristics, but still exhibited a significant performance drop when the entire dataset comprising 3, 704 subjects was used for testing.
BackgroundThere has been considerable work on tracking systems, for example, see [11] [9]. Our system draws ideas from these and other earlier work. While many of the basic ideas are similar, the details are often quite different, and are what account for the systems unique abilities.Some of the major differences stem from our area of application. Our goal is to track targets in a perimeter security type setting, i.e. outdoor operation in area of moderate to high cover. We seek real-time algorithms suitable for COTS (Common-Off-The-Sheff) type of computing, and use x86 based processors. This domain of application significantly restricts the techniques that can be applied. Some of the conThis work supported in part by DARPA Image Understanding's VSAM program. straints, and their implications for our systems include:The lighting is naturally varying. We must handle sunlight filtered through trees and intermittent cloud cover. (We are not considering IR cameras, yet).Targets use camouflage, thus it is unlikely that color will add much information. Figure 3 shows an example scene with a sniper in the grass.Targets will be moving in areas with large amounts of occlusion; finding/classifying outlines will be difficult.Trees/brush/clouds all move. The system must have algorithms to help distinguish these "insignificant" motions from target motions.Many targets will move slowly (less than [1/ 60] pixel per frame); some will move even more slowly. Some will try very hard to blend into the motion of the trees/brush. Therefore frame-to-frame differencing is of limited value. Temporal adaption schemes must not add slow targets to the background.Targets will not, in general, be "upright" or isolated. Thus we have not added "labeling" of targets based on simple shape/scale/orientation models.Targets need to be detected quickly and when they are still very small and distant, e.g. about 10-20 pixels on target.Correlation, template matching, and related techniques cannot be effectively used because of large amounts of occlusion and because in a paraimage, image translation is a very poor model; objects translating in the world undergo rotation and non-linear scaling.Note that, except for the last, these are all generic problem constraints and are not dependent on the geometry of the paraimage. If a system can track under these constraints it can be used in many situations, not just omni-directional tracking in outdoor settings.We also note that, the detection phase is crucial; if targets are not detected they will not be tracked. Detection is also an area where the domain constraints make this more difficult than the situtations considered in most past papers. As a result, much of this paper (and the systems effort) is concentrated on the detection phase. Because of the camouflage and 1
In this paper, we introduce the notion of a programmable imaging system. Such an imaging system provides a human user or a vision system significant control over the radiometric and geometric characteristics of the system. This flexibility is achieved using a programmable array of micro-mirrors. The orientations of the mirrors of the array can be controlled with high precision over space and time. This enables the system to select and modulate rays from the light field based on the needs of the application at hand.We have implemented a programmable imaging system that uses a digital micro-mirror device (DMD), which is used in digital light processing. Although the mirrors of this device can only be positioned in one of two states, we show that our system can be used to implement a wide variety of imaging functions, including, high dynamic range imaging, feature detection, and object recognition. We conclude with a discussion on how a micro-mirror array can be used to efficiently control field of view without the use of moving parts. A Flexible Approach to ImagingIn the past few decades, a wide variety of novel imaging systems have been proposed that have fundamentally changed the notion of a camera. These include high dynamic range, multispectral, omnidirectional, and multiviewpoint imaging systems. The hardware and software of each of these devices are designed to accomplish a particular imaging function and this function cannot be altered without significant redesign.In this paper, we introduce the notion of a programmable imaging system. Such a system gives a human user or a computer vision system significant control over the radiometric and geometric properties of the system. This flexibility is achieved by using a programmable array of micro-mirrors. The orientations of the mirrors of the array can be controlled with very high speed. This enables the system to select and modulate scene rays based on the needs of the application at hand. The end result is a single imaging system that can emulate the functionalities of several existing specialized systems as well as new ones. * This work was done at the Columbia Center for Vision and Graphics. It was supported by an ONR contract (N00014-03-1-0023). The basic principle behind the proposed approach is illustrated in Figure 1. The system observes the scene via a two-dimensional array of micro-mirrors, whose orientations can be controlled. The surface normal n i of the i th mirror determines the direction of the scene ray it reflects into the imaging system. If the normals of the mirrors can be arbitrarily chosen, each mirror can be programmed to select from a continuous cone of scene rays. In addition, each mirror can also be oriented with normal n b such that it reflects a black surface (with zero radiance). Let the integration time of the image detector be T . If the mirror is made to point in the directions n i and n b for durations t and T − t, respectively, the scene ray is attenuated by t/T . As a result, each imaged scene ray can also be radiometr...
Facial attributes are soft-biometrics that allow limiting the search space, e.g., by rejecting identities with nonmatching facial characteristics such as nose sizes or eyebrow shapes. In this paper, we investigate how the latest versions of deep convolutional neural networks, ResNets, perform on the facial attribute classification task. We test two loss functions: the sigmoid cross-entropy loss and the Euclidean loss, and find that for classification performance there is little difference between these two. Using an ensemble of three ResNets, we obtain the new state-of-the-art facial attribute classification error of 8.00 % on the aligned images of the CelebA dataset. More significantly, we introduce the Alignment-Free Facial Attribute Classification Technique (AFFACT), a data augmentation technique that allows a network to classify facial attributes without requiring alignment beyond detected face bounding boxes. To our best knowledge, we are the first to report similar accuracy when using only the detected bounding boxes -rather than requiring alignment based on automatically detected facial landmarks -and who can improve classification accuracy with rotating and scaling test images. We show that this approach outperforms the CelebA baseline on unaligned images with a relative improvement of 36.8 %.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.