Effective Use of Synthetic Data for Urban Scene Semantic Segmentation

Saleh, Fatemeh Sadat; Aliakbarian, Mohammad Sadegh; Salzmann, Mathieu; Petersson, Lars; Álvarez, Jose M.

doi:10.1007/978-3-030-01216-8_6

Cited by 118 publications

(51 citation statements)

References 51 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pan et al show that a careful balance between the instance normalization and batch normalization could enhance a neural network's cross-domain generalization. EUSD [100]. Arguing that object detectors have better generalization capacity in detecting foreground objects (e.g., car, pedestrian, etc.)…”

Section: Other Methodsmentioning

confidence: 99%

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes

Zhang

David

Foroosh

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

130

View full text Add to dashboard Cite

During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving and augmented reality. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data hinders the models' performance. Hence, we propose a curriculum-style learning approach to minimizing the domain gap in urban scene semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network, while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.

show abstract

Section: Other Methodsmentioning

confidence: 99%

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes

Zhang

David

Foroosh

et al. 2020

IEEE Trans. Pattern Anal. Mach. Intell.

130

View full text Add to dashboard Cite

show abstract

“…Then the noisy labels were used to guide the training for road scene segmentation. Another increasingly popular way to overcome the lack of large-scale dataset is explored by the usage of synthetic data, such as VEIS [28], SYNTHIA [29], Virtual KITTI [30], and GTA-V [31]. Synthetic data is usually used to augment real training data [29], [32].…”

Section: Related Workmentioning

confidence: 99%

“…The SYNTHIA dataset is generated by rendering a virtual city created with the Unity development platform for semantic segmentation of driving scenes. Saleh et al [28] proposed VEIS environment to generate the VEIS dataset which has richer foreground classes of real traffic environments. Our previous works [15] can also be regarded as a synthetic dataset which is transformed from a real largescale conventional image dataset.…”

Section: Related Workmentioning

confidence: 99%

Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras

Deng

Yang

et al. 2020

IEEE Trans. Intell. Transport. Syst.

109

View full text Add to dashboard Cite

Understanding the surrounding environment of the vehicle is still one of the challenges for autonomous driving. This paper addresses 360-degree road scene semantic segmentation using surround view cameras, which are widely equipped in existing production cars. First, in order to address large distortion problem in the fisheye images, Restricted Deformable Convolution (RDC) is proposed for semantic segmentation, which can effectively model geometric transformations by learning the shapes of convolutional filters conditioned on the input feature map. Second, in order to obtain a large-scale training set of surround view images, a novel method called zoom augmentation is proposed to transform conventional images to fisheye images. Finally, an RDC based semantic segmentation model is built; the model is trained for real-world surround view images through a multi-task learning architecture by combining real-world images with transformed images. Experiments demonstrate the effectiveness of the RDC to handle images with large distortions, and that the proposed approach shows a good performance using surround view cameras with the help of the transformed images.

show abstract

“…In the past few years, synthesizing images with 3D models using graphics engines have made a figure and attracted much attention in several fields, including human pose estimations [34], indoor scene understanding [25,24], outdoor/urbane scene understanding [30,32], and object detection [26,33,7]. The use of 3D models falls into one of the following categories: (1) Rendering 3D objects on top of static background real-world images [26,34]; (2) Randomly arranging scenes filled with objects [25,24,30,7]; (3) Using commercial game engine, such as Grand Theft Auto V (GTA V) [28,23,32] and the UnrealCV Project [27,3,33].…”

Section: Image Synthesis In 3d Virtual Worldsmentioning

confidence: 99%

SynthText3D: synthesizing scene text images from 3D virtual worlds

Liao

Song

et al. 2020

Sci. China Inf. Sci.

View full text Add to dashboard Cite

With the development of deep neural networks, the demand for a significant amount of annotated training data becomes the performance bottlenecks in many fields of research and applications. Image synthesis can generate annotated images automatically and freely, which gains increasing attention recently. In this paper, we propose to synthesize scene text images from the 3D virtual worlds, where the precise descriptions of scenes, editable illumination/visibility, and realistic physics are provided. Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety. In this way, complex perspective transforms, various illuminations, and occlusions can be realized in our synthesized scene text images. Moreover, the same text instances with various viewpoints can be produced by randomly moving and rotating the virtual camera, which acts as human eyes. The experiments on the standard scene text detection benchmarks using the generated synthetic data demonstrate the effectiveness and superiority of the proposed method. The code and synthetic data will be made available 1 .

show abstract

Effective Use of Synthetic Data for Urban Scene Semantic Segmentation

Cited by 118 publications

References 51 publications

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes

A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes

Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras

SynthText3D: synthesizing scene text images from 3D virtual worlds

Contact Info

Product

Resources

About