Categorizing object-action relations from semantic scene graphs

Aksoy, Eren Erdal; Abramov, Alexey; Wörgötter, Florentin; Dellen, Babette

doi:10.1109/robot.2010.5509319

Cited by 66 publications

(75 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, from the point of view of modeling and learning, this explanation is parsimonious and efficient as compared to modeling the object-object relationships [19] such as chair-keyboard, table-monitor, monitor-keyboard, etc. 1 …”

Section: Orison Swett Marden (1894)mentioning

confidence: 99%

“…In most previous works, object detection and activity recognition have been addressed as separate tasks. Only recently, some works [9,32,1,25,20] have shown that modeling the interaction between human poses and objects in 2D images and videos result in a better performance on the tasks of object detection and activity recognition. In contemporary work, Fouhey et al [6] and Delaitre et al [4] observe humans in videos for estimating 3D geometry and estimating affordances respectively.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

Jiang

Koppula

Saxena

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition

113

View full text Add to dashboard Cite

For scene understanding, one popular approach has been to model the object-object relationships. In this paper, we hypothesize that such relationships are only an artifact of certain hidden factors, such as humans. For example, the objects, monitor and keyboard, are strongly spatially correlated only because a human types on the keyboard while watching the monitor. Our goal is to learn this hidden human context (i.e., the human-object relationships), and also use it as a cue for labeling the scenes. We present Infinite Factored Topic Model (IFTM), where we consider a scene as being generated from two types of topics: human configurations and human-object relationships. This enables our algorithm to hallucinate the possible configurations of the humans in the scene parsimoniously. Given only a dataset of scenes containing objects but not humans, we show that our algorithm can recover the human object relationships. We then test our algorithm on the task of attribute and object labeling in 3D scenes and show consistent improvements over the state-of-the-art.

show abstract

Section: Orison Swett Marden (1894)mentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

Jiang

Koppula

Saxena

2013

2013 IEEE Conference on Computer Vision and Pattern Recognition

113

View full text Add to dashboard Cite

show abstract

“…Moreover, it is robust to a considerable amount of clutter nodes and edges that are unrelated to the action of interest. All these aspects are allowed to vary, and still the same SEC is observed and captures the "essence of the action", as demonstrated with diverse sets of real actions in our earlier work [2], [3].…”

Section: A Semantic Event Chainsmentioning

confidence: 99%

“…The robotic actor and its environment are simulated in Virtual Reality (VR) 1 . The evolution of the manipulation sequence is extracted and represented in the form of so-called Semantic Event Chains [2], [3]. From these, the ManipulationRecognition module extracts plausible actions in progress and recognizes completed actions.…”

Section: Introductionmentioning

confidence: 99%

Active learning of manipulation sequences

Martínez

Alenyà

Jiménez

et al. 2014

2014 IEEE International Conference on Robotics and Automation (ICRA)

Self Cite

View full text Add to dashboard Cite

Abstract-We describe a system allowing a robot to learn goal-directed manipulation sequences such as steps of an assembly task. Learning is based on a free mix of exploration and instruction by an external teacher, and may be active in the sense that the system tests actions to maximize learning progress and asks the teacher if needed. The main component is a symbolic planning engine that operates on learned rules, defined by actions and their pre-and postconditions. Learned by model-based reinforcement learning, rules are immediately available for planning. Thus, there are no distinct learning and application phases. We show how dynamic plans, replanned after every action if necessary, can be used for automatic execution of manipulation sequences, for monitoring of observed manipulation sequences, or a mix of the two, all while extending and refining the rule base on the fly. Quantitative results indicate fast convergence using few training examples, and highly effective teacher intervention at early stages of learning.

show abstract

“…The scenario we focus on in this paper is a human demonstrator teaching a robot about the affordances of objects by showing how they are used. To that end we will, in line with previous work [4], assume that the human is responsible for all the movement in the scene. Furthermore, we assume that the relationships of each pair of objects involved in an activity are dependent, and use a graphical model to model correlation between all objectobject interactions, in order to improve the recognition of functional classes of all objects in the scene and mitigate misleading information.…”

Section: Introductionmentioning

confidence: 95%

Recognizing object affordances in terms of spatio-temporal object-object relationships

Pieropan¹,

Ek²

2014

2014 IEEE-RAS International Conference on Humanoid Robots

View full text Add to dashboard Cite

Abstract-In this paper we describe a probabilistic framework that models the interaction between multiple objects in a scene. We present a spatio-temporal feature encoding pairwise interactions between each object in the scene. By the use of a kernel representation we embed object interactions in a vector space which allows us to define a metric comparing interactions of different temporal extent. Using this metric we define a probabilistic model which allows us to represent and extract the affordances of individual objects based on the structure of their interaction. In this paper we focus on the presented pairwise relationships but the model can naturally be extended to incorporate additional cues related to a single object or multiple objects. We compare our approach with traditional kernel approaches and show a significant improvement.

show abstract

Categorizing object-action relations from semantic scene graphs

Cited by 66 publications

References 16 publications

Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

Hallucinated Humans as the Hidden Context for Labeling 3D Scenes

Active learning of manipulation sequences

Recognizing object affordances in terms of spatio-temporal object-object relationships

Contact Info

Product

Resources

About