2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.234
|View full text |Cite
|
Sign up to set email alerts
|

Joint Discovery of Object States and Manipulation Actions

Abstract: Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
65
0

Year Published

2017
2017
2020
2020

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 67 publications
(66 citation statements)
references
References 38 publications
(92 reference statements)
1
65
0
Order By: Relevance
“…Learning from instructional videos. Instructional videos are rising in popularity in the context of learning steps of complex tasks [2,16,41,42,46,68], visual-linguistic reference resolution [17,18], action segmentation in long untrimmed videos [66] and joint learning of object states and actions [3]. Related to our work, [2,30,62] also consider automatically generated transcription of narrated instructional videos as a source of supervision.…”
Section: Related Workmentioning
confidence: 99%
“…Learning from instructional videos. Instructional videos are rising in popularity in the context of learning steps of complex tasks [2,16,41,42,46,68], visual-linguistic reference resolution [17,18], action segmentation in long untrimmed videos [66] and joint learning of object states and actions [3]. Related to our work, [2,30,62] also consider automatically generated transcription of narrated instructional videos as a source of supervision.…”
Section: Related Workmentioning
confidence: 99%
“…In Drawing, the model attends to specific parts of the sketch such as the head and mouth. The high activations in the Chopstick-Using task occur on the hand position (3,4), chopstick position (2) and the bean locations (1,2,3). Further qualitative results are shown in the supplementary video.…”
Section: Visualizing Performance Rankingmentioning
confidence: 83%
“…From Fig.7 we can see that the trained model is picking details that correspond to what a human would attend to. In Dough-Rolling high activations occur on holes in the dough (1,3), curved or rolled edges (4) and when using a spoon (2). High activations occur in Surgery when strain is put on the material (1, 2), with abnormal needle passes (3) and when there is loose stitching (4).…”
Section: Visualizing Performance Rankingmentioning
confidence: 99%
“…In liquid pouring sequences, container and the liquid state can be estimated from RGB inputs. Alayrac et al [6] model the interaction between actions and objects in a discrete manner. Some methods further demonstrate that liquid amount can be estimated by combining semantic segmentation CNN and LSTM [34,7].…”
Section: Related Workmentioning
confidence: 99%