2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.176
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Sub-Event Dynamics in First-Person Action Recognition

Abstract: First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 27 publications
(10 citation statements)
references
References 50 publications
(118 reference statements)
0
10
0
Order By: Relevance
“…Another line of research consists of developing effective techniques for extracting temporal information present in the video. In [48], [73] features are extracted from a series of frames to perform temporal pooling with different operations, including max pooling, sum pooling, or histogram of gradients. Then, a temporal pyramid structure allows the encoding of both long term and short term characteristics.…”
Section: First Person Action Recognitionmentioning
confidence: 99%
“…Another line of research consists of developing effective techniques for extracting temporal information present in the video. In [48], [73] features are extracted from a series of frames to perform temporal pooling with different operations, including max pooling, sum pooling, or histogram of gradients. Then, a temporal pyramid structure allows the encoding of both long term and short term characteristics.…”
Section: First Person Action Recognitionmentioning
confidence: 99%
“…Fully convolutional approaches viewed action recognition as a learning-based problem with CNNs being used as appearance [24], [57] and motion [58] feature extractors. More data hungry methods used multi-stream deep networks that utilized optical flow alongside RGB images as input modalities [21], [38], [59], [60], [61] to be able to focus on motion.…”
Section: Video Activity Recognitionmentioning
confidence: 99%
“…These features include the location of hands [30]- [32], the interaction with active/passive objects [33]- [38], the head motion [39], [40], the gaze [41]- [45], or a combination of them [46]- [48]. Other methods have explored egocentric contexts like social interactions [49] and the temporal structure of the activities [50]- [52]. Additionally, some approaches have adapted deep third-person action recognition methods [53]- [55] and developed new ones based on reinforcement learning [7].…”
Section: B Activity Recognition From Egocentric Videosmentioning
confidence: 99%