2020
DOI: 10.1145/3402447
|View full text |Cite
|
Sign up to set email alerts
|

Am I Done? Predicting Action Progress in Videos

Abstract: In this article, we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task, since it can be valuable for a wide range of interaction applications. To this end, we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during its execution. To provide a general de… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 18 publications
(17 citation statements)
references
References 55 publications
(107 reference statements)
0
17
0
Order By: Relevance
“…Aliakbarian et al [5] proposed a two stage LSTM architecture which models context and action to perform early action recognition. Beccattini et al [50] designed ProgressNet, an approach capable of estimating the progress of actions and localizing them in space and time. De Geest and Tuytelaars [51] addressed early action recognition proposing a "feedback network" which uses two LSTM streams to interpret feature representations and model the temporal structure of subsequent observations.…”
Section: Early Action Recognition In Third Person Visionmentioning
confidence: 99%
“…Aliakbarian et al [5] proposed a two stage LSTM architecture which models context and action to perform early action recognition. Beccattini et al [50] designed ProgressNet, an approach capable of estimating the progress of actions and localizing them in space and time. De Geest and Tuytelaars [51] addressed early action recognition proposing a "feedback network" which uses two LSTM streams to interpret feature representations and model the temporal structure of subsequent observations.…”
Section: Early Action Recognition In Third Person Visionmentioning
confidence: 99%
“…The early action recognition task [22], [23], [24], [1] is to recognize the ongoing action as early as possible from partial observations. In this task, the model is only allowed to observe a part of the action videos, and predict the action based on the video segment [25], [26].…”
Section: B Early Action Recognitionmentioning
confidence: 99%
“…Multi-stream architectures have been widely employed for action [20,22,23,[32][33][34] and gesture recognition [12][13][14][15][16]27,35]. This technique consists of processing different versions of the same video in parallel with two or more CNNs.…”
Section: Multi-stream Gesture Recognitionmentioning
confidence: 99%
“…However, gesture spotting is needed for practical applications, since the duration and temporal boundaries of gestures are commonly unknown in practice [17,18]. It is worth noting that temporal action proposal generation (TAPG) is similar to gesture spotting, and receives more attention from the research community [19][20][21][22][23][24][25][26]. TAPG generates video segment proposals (candidates) that may contain human action instances from untrimmed videos.…”
Section: Introductionmentioning
confidence: 99%