2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
DOI: 10.1109/iros40897.2019.8968278
|View full text |Cite
|
Sign up to set email alerts
|

Learning Actions from Human Demonstration Video for Robotic Manipulation

Abstract: Learning actions from human demonstration is an emerging trend for designing intelligent robotic systems, which can be referred as video to command. The performance of such approach highly relies on the quality of video captioning. However, the general video captioning methods focus more on the understanding of the full frame, lacking of consideration on the specific object of interests in robotic manipulations. We propose a novel deep model to learn actions from human demonstration video for robotic manipulat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
6
2

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(8 citation statements)
references
References 36 publications
0
8
0
Order By: Relevance
“…Nguyen et al [3], [5] proposed to caption human actions into command sentences, which can be used to control robotic actions. Similar works can be observed to improve the capabilities of Vision-Language models under robotic settings for problems like Human-Robot Interaction [24]- [26], action learning and planning [1], [2], [4], [27], [28], etc. However, the evaluation of these methods usually involves: (1) sampling of a small fixed number of frames, which is not suitable when intermediate feedback is continuously requested in a real-time video stream; or (2) heavy reliance on object detection, which is only weakly associated to manipulation contexts.…”
Section: B Vision and Language In Roboticsmentioning
confidence: 82%
See 1 more Smart Citation
“…Nguyen et al [3], [5] proposed to caption human actions into command sentences, which can be used to control robotic actions. Similar works can be observed to improve the capabilities of Vision-Language models under robotic settings for problems like Human-Robot Interaction [24]- [26], action learning and planning [1], [2], [4], [27], [28], etc. However, the evaluation of these methods usually involves: (1) sampling of a small fixed number of frames, which is not suitable when intermediate feedback is continuously requested in a real-time video stream; or (2) heavy reliance on object detection, which is only weakly associated to manipulation contexts.…”
Section: B Vision and Language In Roboticsmentioning
confidence: 82%
“…Intelligent robots face challenges in: (1) interpreting sensor inputs of vision and force contact interactions through modeling and learning from daily life knowledge; and (2) performing intelligent actions that take into account the surrounding physical environment as well as human intention. Various studies [1], [2], [4], [6]- [13] have structured and planned manipulation actions and activities for robotics in ways similar to human thinking, however it is still challenging to extract contextual knowledge directly from daily life.…”
Section: Introductionmentioning
confidence: 99%
“…With the advent of deep learning ( LeCun et al, 2015 ; Goodfellow et al, 2016 ), it was possible to learn visual features characterising the task directly from raw RGB videos. The features are extracted from raw videos using a variety of methods: deep metric learning ( Sermanet et al, 2018 ), generative adversarial learning ( Stadie et al, 2017 ), domain translation ( Liu et al, 2018 ; Smith et al, 2019 ; Sharma et al, 2019 ), transfer learning ( Sharma et al, 2018 ; Sermanet et al, 2017 ), action primitives ( Jia et al, 2020 ), predictive modelling ( Tow et al, 2017 ), video to text translation ( Yang et al, 2019 ), meta-learning and ( Yu et al, 2018a ; Yu et al, 2018b ). A comparison of these methods is given in the Table 1 and a detailed study can be found in ( Pauly, 2021 ).…”
Section: Related Workmentioning
confidence: 99%
“…Traditional rigid robots, e.g. dexterous hands (Mattar, 2013; Rebollo et al , 2017) and robotic arms (Yang et al , 2019; Golluccio et al , 2020), are generally composed of stiff materials and bend by actuating discrete joints. Soft robots, e.g.…”
Section: Introductionmentioning
confidence: 99%