Aswin Chandarr scite author profile

Due to population ageing, the cost of health care will raise in the coming years. One way to help humans, and especially elderly people, is the introduction of domestic robots that can assist people in daily life such that they are less dependent on home care. Joint visual attention models can be used for natural robot-human interaction. Joint visual attention is that two humans or a robot and a human have a shared attention to the same object. This can be accomplished by pointing, eye-gaze or by using speech. The goal of this thesis is to develop a non verbal joint visual attention model for object detection that integrates gestures, gaze, saliency and depth. The question that will be answered in this report is: how can the information from gestures, gaze, saliency and depth be integrated in the most efficient way to determine the object of interest?Existing joint visual attention models only work when the human is in front of the robot, so that the human is in view of the camera. Our model should be more flexible than existing models, so it needs to work in different configurations of human, robot and object. Furthermore, the joint visual attention model should be able to determine the object of interest when the pointing direction or the gaze location is not available.The saliency algorithm of Itti et al. [1] has been used to create a bottom up saliency map. The second bottom-up cue, depth, is determined by means of segmenting the environment to extract the objects. Apart from the bottom-up cues, top-down cues can be used as well. The pointing finger is identified and based on the eigenvalues and eigenvectors of the finger the pointing direction will be retrieved. A pointing map is created by means of the angle between the 3D pointing direction vector and the 3D vector from the pointing finger to the object. A hybrid model, which computes a gaze map, has been developed that switches depending on textureness of the object between texture based approach and color based approach.Depending on the configuration of the human, robot and object, three or four maps are available to determine the object of interest. In some configurations, the pointing map or gaze map is not available. In that case the combined saliency map is obtained by point wise multiplication of these three maps. If all four maps are at our disposal, all maps are added and multiplied by the pointing mask.When the human and robot are opposite of each other and pointing, bottom up saliency and depth are combined, 93.3% of the objects are detected correctly. If the human is standing next to the robot, the gaze map, bottom up saliency map and depth map are combined, then the detection rate is 67.8%. If robot, human and object are standing in a triangular shape, the detection rate is equal to 96.3%.The main contribution is that the joint visual attention model is able to detect objects of interest in different configurations of human, robot and object and it also works when one of the four cues is not available. Furthermore, a hybrid model has ...

show abstract

Knowing What You Don’t Know

Moerland

Chandarr

Rudinac

et al. 2016

View full text Add to dashboard Cite

Novelty detection is essential for personal robots to continuously learn and adapt in open environments. This paper specifically studies novelty detection in the context of action recognition. To detect unknown (novel) human action sequences we propose a new method called background models, which is applicable to any generative classifier. Our closed-set action recognition system consists of a new skeleton-based feature combined with a Hidden Markov Model (HMM)-based generative classifier, which has shown good earlier results in action recognition. Subsequently, novelty detection is approached from both a posterior likelihood and hypothesis testing view, which is unified as background models. We investigate a diverse set of background models: sum over competing models, filler models, flat models, anti-models, and some reweighted combinations. Our standard recognition system has an inter-subject recognition accuracy of 96% on the Microsoft Research Action 3D dataset. Moreover, the novelty detection module combining anti-models with flat models has 78% accuracy in novelty detection, while maintaining 78% standard recognition accuracy as well. Our methodology can increase robustness of any current HMM-based action recognition system against open environments, and is a first step towards an incrementally learning system.

show abstract

Multimodal human centric object recognition framework for personal robots

Chandarr

Rudinac

Jonker

2014

View full text Add to dashboard Cite

In this paper we focus on a perception system for cognitive interaction between robots and humans especially for learning to recognize objects in household environments. Therefore we propose a novel three layered framework for object learning to bridge the gap between the robot's recognition capabilities at lower neural level to the higher cognitive level of humans using the weighted fusion of multimodal sources like chromatic, structure and spatial information. In the first layer we propose the grounding of the raw sensory information into semantic concepts for each modality. We obtain a semantic color representation by using SLIC super-pixeling followed by a mapping learned from online images using a PLSA model. This results in a probability distribution over basic color names derived from cognitive linguistic studies. To represent structural information, we propose to cluster the ESF features obtained from Pointcloud data into primitive shape categories. This primitive shape knowledge is learned and expanded from the robot's experience. For spatial information a metric map from the navigation system, demarcated into landmark locations is used. All these semantic representations are compliant with a human's description of his environment and further used in the second layer to generate probabilistic knowledge about the objects using random forest classifiers. In the third layer, we propose a novel weighted fusion of the obtained object probabilities, where the weights are derived from the prior experience of the robot. We evaluate our system in realistic domestic conditions provided at a Robocup@Home setting.

show abstract

A novel multi modal tracking method based on depth and semantic color features for human robot interaction

Chandarr

Rudinac

Jonker

2015

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aswin Chandarr

Portable, automatic water level estimation using mobile phone cameras

Multimodal joint visual attention model for natural human-robot interaction in domestic environments

Knowing What You Don’t Know

Multimodal human centric object recognition framework for personal robots

A novel multi modal tracking method based on depth and semantic color features for human robot interaction

Contact Info

Product

Resources

About