No abstract
Abstract-This paper explores the process of self-guided learning of realistic facial expression production by a robotic head with 31 degrees of freedom. Facial motor parameters were learned using feedback from real-time facial expression recognition from video. The experiments show that these control properties can be learned through an active exploration and feedback loop. The mapping of servos to expressions was learned in under one-hour of training time. We discuss how our work may help illuminate the computational study of how infants learn to make facial expressions.
Recent years have seen the development of fast and accurate algorithms for detecting objects in images. However, as the size of the scene grows, so do the running-times of these algorithms. If a 128 × 102 pixel image requires 20ms to process, searching for objects in a 1280 × 1024 image will take 2s. This is unsuitable under real-time operating constraints: by the time a frame has been processed, the object may have moved. An analogous problem occurs when controlling robot camera that need to scan scenes in search of target objects. In this paper, we consider a method for improving the run-time of general-purpose object-detection algorithms. Our method is based on a model of visual search in humans, which schedules eye fixations to maximize the long-term information accrued about the location of the target of interest. The approach can be used to drive robot cameras that physically scan scenes or to improve the scanning speed for very large high resolution images. We consider the latter application in this work by simulating a "digital fovea" and sequentially placing it in various regions of an image in a way that maximizes the expected information gain. We evaluate the approach using the OpenCV version of the Viola-Jones face detector. After accounting for all computational overhead introduced by the fixation controller, the approach doubles the speed of the standard Viola-Jones detector at little cost in accuracy.
Abstract-Modeling eye-movements during search is important for building intelligent robotic vision systems, and for understanding how humans select relevant information and structure behavior in real time. Previous models of visual search (VS) rely on the idea of "saliency maps" which indicate likely locations for targets of interest. In these models the eyes move to locations with maximum saliency. This approach has several drawbacks:(1) It assumes that oculomotor control is a greedy process, i.e., every eye movement is planned as if no further eye movements would be possible after it. (2) It does not account for temporal dynamics and how information is integrated as over time. (3) It does not provide a formal basis to understand how optimal search should vary as a function of the operating characteristics of the visual system. To address these limitations, we reformulate the problem of VS as an Information-gathering Partially Observable Markov Decision Process (I-POMDP). We find that the optimal control law depends heavily on the Foveal-Peripheral Operating Characteristic (FPOC) of the visual system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.