2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016
DOI: 10.1109/cvpr.2016.69
|View full text |Cite
|
Sign up to set email alerts
|

Learning Action Maps of Large Environments via First-Person Vision

Abstract: When people observe and interact with physical spaces, they are able to associate functionality to regions in the environment. Our goal is to automate dense functional understanding of large spaces by leveraging sparse activity demonstrations recorded from an ego-centric viewpoint. The method we describe enables functionality estimation in large scenes where people have behaved, as well as novel scenes where no behaviors are observed. Our method learns and predicts "Action Maps", which encode the ability for a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(28 citation statements)
references
References 23 publications
0
28
0
Order By: Relevance
“…Immediate effort is also expected in action/gesture localization in long, untrimmed, and realistic videos [128,34,95]. As such, we envision newer problems like early recognition [28], multi-task learning [127], captioning, recognition from low resolution sequences [66] and lifelog devices [87] will receive attention in the next years.…”
Section: Discussionmentioning
confidence: 99%
“…Immediate effort is also expected in action/gesture localization in long, untrimmed, and realistic videos [128,34,95]. As such, we envision newer problems like early recognition [28], multi-task learning [127], captioning, recognition from low resolution sequences [66] and lifelog devices [87] will receive attention in the next years.…”
Section: Discussionmentioning
confidence: 99%
“…Environmental representations that describe both the data related to the object and to the action in the environment have been proposed. Action maps have been proposed as environment representations focusing on actions, which embed the action possibility in real space based on the history of human activities [14], [15]. In addition, another approach is to apply an object classification method based on the concept of affordance to associate the actions with objects [16], [17].…”
Section: B Environmental Representation Methodsmentioning
confidence: 99%
“…Additional effort is expected to advance in the research of methods able to simultaneously perform both detection and recognition tasks in long, realistic videos (Gkioxari and Malik, 2015;Shou et al, 2016b). As such, we envision other related problems like early recognition Escalante et al (2016a), multi task learning , captioning, recognition from low resolution sequences Nasrollahi et al (2015) and from lifelog devices Rhinehart and Kitani (2016) will receive special attention within the next few years.…”
Section: Future Workmentioning
confidence: 99%