2015
DOI: 10.1007/978-3-319-16814-2_12
|View full text |Cite
|
Sign up to set email alerts
|

Action Recognition in the Presence of One Egocentric and Multiple Static Cameras

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(19 citation statements)
references
References 34 publications
0
19
0
Order By: Relevance
“…Second, action recognition models in the literature rely on computer-vision based approaches to analyze 2D videos recorded by an egocentric camera, e.g., (Fathi et al, 2011 , 2012 ; Fathi and Rehg, 2013 ; Matsuo et al, 2014 ; Soran et al, 2015 ; Ma et al, 2016 ; Li et al, 2018 ; Furnari and Farinella, 2019 ; Sudhakaran et al, 2019 ; Liu et al, 2020 ). Whether using hand-crafted features (Fathi et al, 2011 , 2012 ; Fathi and Rehg, 2013 ; Matsuo et al, 2014 ; Soran et al, 2015 ; Ma et al, 2016 ; Furnari and Farinella, 2019 ) or learning end-to-end models (Li et al, 2018 ; Sudhakaran et al, 2019 ; Liu et al, 2020 ), the computer vision-based approaches to action recognition must also address the challenges of identifying and tracking activity-relevant objects. In contrast, we bypassed the challenges inherent in 2D image analysis by combining an eyetracker with a marker-based motion capture system.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, action recognition models in the literature rely on computer-vision based approaches to analyze 2D videos recorded by an egocentric camera, e.g., (Fathi et al, 2011 , 2012 ; Fathi and Rehg, 2013 ; Matsuo et al, 2014 ; Soran et al, 2015 ; Ma et al, 2016 ; Li et al, 2018 ; Furnari and Farinella, 2019 ; Sudhakaran et al, 2019 ; Liu et al, 2020 ). Whether using hand-crafted features (Fathi et al, 2011 , 2012 ; Fathi and Rehg, 2013 ; Matsuo et al, 2014 ; Soran et al, 2015 ; Ma et al, 2016 ; Furnari and Farinella, 2019 ) or learning end-to-end models (Li et al, 2018 ; Sudhakaran et al, 2019 ; Liu et al, 2020 ), the computer vision-based approaches to action recognition must also address the challenges of identifying and tracking activity-relevant objects. In contrast, we bypassed the challenges inherent in 2D image analysis by combining an eyetracker with a marker-based motion capture system.…”
Section: Discussionmentioning
confidence: 99%
“…Second, action recognition models in the literature rely on computer-vision based approaches to analyze 2D videos recorded by an egocentric camera, e.g., (Fathi et al, 2011(Fathi et al, , 2012Fathi and FIGURE 9 | Point clouds of the four activity-relevant objects involved in Activity 1 were segmented into multiple regions for finer spatial resolution: (A) pitcher, (B) pitcher lid, (C) spoon, and (D) mug. Rehg, 2013;Matsuo et al, 2014;Soran et al, 2015;Ma et al, 2016;Li et al, 2018;Furnari and Farinella, 2019;Sudhakaran et al, 2019;Liu et al, 2020). Whether using hand-crafted features (Fathi et al, 2011(Fathi et al, , 2012Fathi and Rehg, 2013;Matsuo et al, 2014;Soran et al, 2015;Ma et al, 2016;Furnari and Farinella, 2019) or learning end-to-end models (Li et al, 2018;Sudhakaran et al, 2019;Liu et al, 2020), the computer vision-based approaches to action recognition must also address the challenges of identifying and tracking activity-relevant objects.…”
Section: Comparisons To State-of-the-art Recognition Algorithmsmentioning
confidence: 99%
“…Human reidentification by matching viewers in top-view and egocentric cameras have been tackled by establishing the correspondences between the views in [1]. Soran et al [29] utilize the information from one egocentric camera and multiple exocentric cameras to solve the action recognition task. Ardeshir et al [2] learn motion features of actions performed in ego-and exocentric domains to transfer motion information across the two domains.…”
Section: Relating Aerial and Ground-level Imagesmentioning
confidence: 99%
“…Several recent papers have shown the potential for combining first-person video analysis with evidence from other types of synchronized video, including from other firstperson cameras [3,29], multiple third-person cameras [26], or even hand-mounted cameras [5]. However, these papers assume that a single person appears in each video, avoiding the person-level correspondence problem.…”
Section: Related Workmentioning
confidence: 99%
“…Despite its importance, we are aware of very little work that tries to address this problem. Several recent papers propose using multiple cameras for joint first-person recognition [3,5,26,29], but make simplistic assumptions like that only one person appears in the scene. Using visual SLAM to infer first-person camera trajectory and map to third-person cameras (e.g., [17,19]) works well in some settings, but can fail for crowded environments when longterm precise localizations are needed and when first-person video has significant motion blur.…”
Section: Introductionmentioning
confidence: 99%