2020
DOI: 10.1007/978-3-030-58621-8_20
|View full text |Cite
|
Sign up to set email alerts
|

Procedure Planning in Instructional Videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
59
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(59 citation statements)
references
References 28 publications
0
59
0
Order By: Relevance
“…In the vision paper of [23], it was recognised that the vast amount of online resources such as videos can be used to drive ubiquitous computing. In fact, the study of [24] suggests harnessing videos to segment its frames (cooking steps) and transform them into a latent space where a Markov Decision Process algorithm is employed to learn the sequence of cooking steps-essentially a planning algorithm that can potentially narrate a cooking workflow.…”
Section: B An Abstract Iot Cooking Workflow Should Respectmentioning
confidence: 99%
“…In the vision paper of [23], it was recognised that the vast amount of online resources such as videos can be used to drive ubiquitous computing. In fact, the study of [24] suggests harnessing videos to segment its frames (cooking steps) and transform them into a latent space where a Markov Decision Process algorithm is employed to learn the sequence of cooking steps-essentially a planning algorithm that can potentially narrate a cooking workflow.…”
Section: B An Abstract Iot Cooking Workflow Should Respectmentioning
confidence: 99%
“…To plan in unknown environments, the agent needs to learn the environment dynamics from previous experiences. Recent model-based RL schemes have shown promise that deep networks can learn a transition model directly from low-dimensional observations and plan with the learned model [40,6,11]. A closely related method is Universal Planning Networks (UPN) [32] that learns a plannable latent space with gradient descent by minimizing an imitation loss, i.e., learned from an expert planner.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper, we focus on learning the goal-directed actions from instructional videos. Recently, Chang et al [6] proposed a new problem known as procedure planning in instructional videos. It requires a model to 1) plan a sequence of verb-argument actions and 2) retrieve the intermediate steps for achieving a given visual goal in real-life tasks such as making a strawberry cake (see Fig.…”
Section: Introductionmentioning
confidence: 99%
“…Instructional video understanding. Beyond image semantics (Yatskar et al, 2016), unlike existing tasks for learning from instructional video (Zhou et al, 2018c;Tang et al, 2019;Alayrac et al, 2016;Song et al, 2015;Sener et al, 2015;Huang et al, 2016;Sun et al, 2019b,a;Plummer et al, 2017;Palaskar et al, 2019), combining video & text information in procedures (Yagcioglu et al, 2018;Fried et al, 2020), visual-linguistic reference resolution (Huang et al, 2018(Huang et al, , 2017, visual planning (Chang et al, 2019), joint learning of object and actions Richard et al, 2018;Gao et al, 2017;Damen et al, 2018b), pretraining joint embedding of high level sentence with video clips (Sun et al, 2019b;Miech et al, 2019), our task proposal requires explicit structured knowledge tuple extraction.…”
Section: Related Workmentioning
confidence: 99%