Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Hong, Zhang-Wei; Su, Shih-Yang; Shann, Tzu-Yun; Chang, Yuan‐Chieh; Yang, Hong-Tzer; Ho, Brian Hsi-Lin; Tu, Chih-Chieh; Chang, Yueh-Chuan; Hsiao, Tsu-Ching; Hsiao, Hsin-Wei; Lai, Sih-Pin; Lee, Chun-Yi

doi:10.48550/arxiv.1802.00285

Cited by 12 publications

(19 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For low level vision tasks, synthetic images have been employed for stereo vision [45] and optical flow estimation [5]. For higher level tasks, computer-aided design (CAD) models have also been extensively used for object detection [27,44,34] or segmentation [18]. Synthetic human figures have been extensively used for learning purposes, such as silhouette-based action recognition tasks [48], and crowd counting [63].…”

Section: Synthetic Human Pose Datamentioning

confidence: 99%

Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data

Liu¹,

Naveen²,

Ostadabbas³

2021

Preprint

View full text Add to dashboard Cite

The ultimate goal for an inference model is to be robust and functional in real life applications. However, training vs. test data domain gaps often negatively affect model performance. This issue is especially critical for the monocular 3D human pose estimation problem, in which 3D human data is often collected in a controlled lab setting. In this paper, we focus on alleviating the negative effect of domain shift by presenting our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces. AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired. We illustrate the 3D pose estimation performance of AHuP in two scenarios. First, when source and target data differ significantly in both appearance and pose spaces, in which we learn from synthetic 3D human data (with zero real 3D human data) and show comparable performance with the state-of-the-art 3D pose estimation models that have full access to the real 3D human pose benchmarks for training. Second, when source and target datasets differ mainly in the pose space, in which AHuP approach can be applied to further improve the performance of the state-of-the-art models when tested on the datasets different from their training dataset.Keywords 3D human pose estimation • domain shift • semantic aware adaptation • synthetic human datasets.

show abstract

Section: Synthetic Human Pose Datamentioning

confidence: 99%

Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data

Liu¹,

Naveen²,

Ostadabbas³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The performance is excellent in simulation but barely satisfying in the real world due to the lack of modeling the noise in real depth images. The navigation model based on depth image in [23] behaves poorly out of the same reason.…”

Section: Related Workmentioning

confidence: 99%

“…One implicit approach is to adopt an image segmentation network as semantic feature extraction layers and add new layers to output the control commands [24]. Another explicit approach with better performance is to generate a semantic segmentation image first and then feeds it to another network to get waypoints [9] or velocity output [23], [25], [26]. The above works behave well when the simulated training environment is elaborate and the testing scenario is not cluttered.…”

Section: Related Workmentioning

confidence: 99%

“…The noise added to RGB image follows the approach in [1], including the change in contrast, tone and brightness and the addition of Gaussian noise, Gaussian blur, and salt-and-pepper noise. As to the generation of the two segmented images, PSPNet is trained with ADE20k dataset [43] as in [26] and FC-DenseNet is trained with CamVid dataset [44]. In simulation, the parameters are refined with a dataset we labeled to increase the accuracy.…”

Section: B Environment Representationmentioning

confidence: 99%

See 1 more Smart Citation

Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding

Chen,

Yu,

Dong

et al. 2019

Preprint

View full text Add to dashboard Cite

While training an end-to-end navigation network in the real world is usually of high cost, simulation provides a safe and cheap environment in this training stage. However, training neural network models in simulation brings up the problem of how to effectively transfer the model from simulation to the real world (sim-to-real). In this work, we regard the environment representation as a crucial element in this transfer process and propose a visual information pyramid (VIP) model to systematically investigate a practical environment representation. A novel representation composed of spatial and semantic information synthesis is then established accordingly, where noise model embedding is particularly considered. To explore the effectiveness of this representation, we compared the performance with representations popularly used in the literature in both simulated and real-world scenarios. Results suggest that our environment representation stands out. Furthermore, an analysis on the feature map is implemented to investigate the effectiveness through inner reaction, which could be irradiative for future researches on end-to-end navigation.

show abstract

“…3) Simulation approaches [20], [21]: This method can artificially create large datasets by changing various background images and by capturing images of the target object from multiple locations. However, the quality of the dataset is usually low for objects that are difficult to simulate, such as deformable objects.…”

Section: B Automatic Annotationmentioning

confidence: 99%

Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation

Takahashi

Yonekura

2019

Preprint

View full text Add to dashboard Cite

We propose invisible marker for accurate automatic annotation to manipulate objects. Invisible marker is invisible in visible light, whereas it can be visible by applying ultraviolet light in the dark. By capturing images while alternately switching between visible and invisible light at high speed, massive annotation datasets for objects painted with invisible marker are created quickly and inexpensively. We show comparison with manual annotation and demonstrations of semantic segmentation by deep learning for deformable objects such as cloth, liquid, and powder. 1 * The starred authors are contributed equally.† K. Takahashi and K. Yonekura are associated with Preferred Networks, Inc.

show abstract

Virtual-to-Real: Learning to Control in Visual Semantic Segmentation

Cited by 12 publications

References 0 publications

Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data

Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D Pose Data

Learning to Navigate from Simulation via Spatial and Semantic Information Synthesis with Noise Model Embedding

Invisible Marker: Automatic Annotation of Segmentation Masks for Object Manipulation

Contact Info

Product

Resources

About