Abstract-The problem of object recognition has not yet been solved in its general form. The most successful approach to it so far relies on object models obtained by training a statistical method on visual features obtained from camera images. The images must necessarily come from huge visual datasets, in order to circumvent all problems related to changing illumination, point of view, etc.We hereby propose to also consider, in an object model, a simple model of how a human being would grasp that object (its affordance). This knowledge is represented as a function mapping visual features of an object to the kinematic features of a hand while grasping it. The function is practically enforced via regression on a human grasping database.After describing the database (which is publicly available) and the proposed method, we experimentally evaluate it, showing that a standard object classifier working on both sets of features (visual and motor) has a significantly better recognition rate than that of a visual-only classifier.
In this paper, we propose a sparse coding approach to background modeling. The obtained model is based on dictionaries which we learn and keep up to date as new data are provided by a video camera. We observe that, without dynamic events, video frames may be seen as noisy data belonging to the background. Over time, such background is subject to local and global changes due to variable illumination conditions, camera jitter, stable scene changes, and intermittent motion of background objects. To capture the locality of some changes, we propose a space-variant analysis where we learn a dictionary of atoms for each image patch, the size of which depends on the background variability. At run time, each patch is represented by a linear combination of the atoms learnt online. A change is detected when the atoms are not sufficient to provide an appropriate representation, and stable changes over time trigger an update of the current dictionary. Even if the overall procedure is carried out at a coarse level, a pixel-wise segmentation can be obtained by comparing the atoms with the patch corresponding to the dynamic event. Experiments on benchmarks indicate that the proposed method achieves very good performances on a variety of scenarios. An assessment on long video streams confirms our method incorporates periodical changes, as the ones caused by variations in natural illumination. The model, fully data driven, is suitable as a main component of a change detection system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.