Where2Act: From Pixels to Actions for Articulated 3D Objects

Mo, Kaichun; Guibas, Leonidas J.; Mukadam, Mustafa; Gupta, Abhinav; Tulsiani, Shubham

doi:10.1109/iccv48922.2021.00674

Cited by 85 publications

(46 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Researchers have used other forms of supervision (strong supervision, weak supervision, imitation learning, reinforcement learning, inverse reinforcement learning) to build interactive understanding of objects. This can be in the form of learning a) where and how to grasp [9,21,26,27,30,35,36,39,43,51], b) state classifiers [25], c) interaction hotspots [15,42,44,61], d) spatial priors for action sites [46], e) object articulation modes [12,38], f) reward functions [29,31,50,52], g) functional correspondences [34]. While our work pursues similar goals, we differ in our supervision source (observation of human hands interacting with objects in egocentric videos).…”

Section: Related Workmentioning

confidence: 99%

Human Hands as Probes for Interactive Object Understanding

Goyal¹,

Modi²,

Goyal³

et al. 2021

Preprint

View full text Add to dashboard Cite

Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.

show abstract

Section: Related Workmentioning

confidence: 99%

Human Hands as Probes for Interactive Object Understanding

Goyal¹,

Modi²,

Goyal³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…This is crucial because in the beginning, when there are much fewer positive examples than negative examples and the dataset is imbalanced, the model may converge to a suboptimal solution in which all values in the output are close to 0. This technique is also used in other work with similar problems [30,48]. We use a similar strategy for balancing data across different tasks.…”

Section: Trainingmentioning

confidence: 99%

Tool as Embodiment for Recursive Manipulation

Noguchi¹,

Matsushima²,

Matsuo³

et al. 2021

Preprint

View full text Add to dashboard Cite

Humans and many animals exhibit a robust capability to manipulate diverse objects, often directly with their bodies and sometimes indirectly with tools. Such flexibility is likely enabled by the fundamental consistency in underlying physics of object manipulation such as contacts and force closures. Inspired by viewing tools as extensions of our bodies, we present Tool-As-Embodiment (TAE), a parameterization for tool-based manipulation policies that treat hand-object and tool-object interactions in the same representation space. The result is a single policy that can be applied recursively on robots to use end effectors to manipulate objects, and use objects as tools, i.e. new endeffectors, to manipulate other objects. By sharing experiences across different embodiments for grasping or pushing, our policy exhibits higher performance than if separate policies were trained. Our framework could utilize all experiences from different resolutions of tool-enabled embodiments to a single generic policy for each manipulation skill. Videos at https://sites.google.com/ view/recursivemanipulation

show abstract

“…Prior work has avoided this roadblock in two ways: (1) with human supervision [19,2]; or (2) by greatly constraining the space of possible actions [24,7,8,21,20]. Although labelled data (e.g., keypoint annotations where an object should be grasped and interacted with) remove the need to sample actions, they can be expensive, time-consuming to collect and may encode irrelevant human biases.…”

Section: Introductionmentioning

confidence: 99%