2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
DOI: 10.1109/cvpr.2017.278
|View full text |Cite
|
Sign up to set email alerts
|

Procedural Generation of Videos to Train Deep Action Recognition Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
75
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 103 publications
(76 citation statements)
references
References 53 publications
0
75
0
1
Order By: Relevance
“…To generalize to in-the-wild images, Mehta et al [50] proposed a 2D-to-3D knowledge transfer, i.e., using pre-trained 2D pose networks to initialize the 3D pose regression networks while in [51] the common representations between the 2D and the 3D tasks are shared. To compensate for the lack of large scale in-the-wild datasets, recent work has also proposed to generate training images for particular 3D pose datasets such as the CMU MoCap dataset [6] by stitching image regions [8], animating human 3D models [7], [52], using a game engine [53] or by rendering textured 3D body scans [54], [55]. These synthetic datasets have proved to be useful for training CNN architectures, yet often requiring a domain adaptation stage.…”
Section: D Human Pose From a Single Imagementioning
confidence: 99%
“…To generalize to in-the-wild images, Mehta et al [50] proposed a 2D-to-3D knowledge transfer, i.e., using pre-trained 2D pose networks to initialize the 3D pose regression networks while in [51] the common representations between the 2D and the 3D tasks are shared. To compensate for the lack of large scale in-the-wild datasets, recent work has also proposed to generate training images for particular 3D pose datasets such as the CMU MoCap dataset [6] by stitching image regions [8], animating human 3D models [7], [52], using a game engine [53] or by rendering textured 3D body scans [54], [55]. These synthetic datasets have proved to be useful for training CNN architectures, yet often requiring a domain adaptation stage.…”
Section: D Human Pose From a Single Imagementioning
confidence: 99%
“…Another increasingly popular way to overcome the lack of large-scale dataset is explored by the usage of synthetic data, such as VEIS [28], SYNTHIA [29], Virtual KITTI [30], and GTA-V [31]. Synthetic data is usually used to augment real training data [29], [32]. The SYNTHIA dataset is generated by rendering a virtual city created with the Unity development platform for semantic segmentation of driving scenes.…”
Section: Related Workmentioning
confidence: 99%
“…Khodabandeh et al [13] provide a method to automatically generate an action recognition dataset by partitioning a video into action, subject and context. Souza et al [14] proposed a database of simulated human actions. They used motion capture data containing action annotations combined with 3D human models in a simulated environment and show that it improves action recognition rates when combined with a small amount of annotated real world data.…”
Section: Related Workmentioning
confidence: 99%