IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation

Wang, Qiang; Zheng, Shizhen; Yan, Qingsong; Deng, Fei; Zhao, Kaixuan; Chu, Xiaowen

doi:10.1109/icme51207.2021.9428423

Cited by 9 publications

(14 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. In response to this challenge, the computer vision community has developed several photorealistic synthetic datasets and interactive simulation environments that have spurred rapid progress towards the goal of holistic indoor scene understanding [5,6,8,9,13,14,17,20,22,29,31,34,35,37,41,42,43,44,47,50,57,58,59,61,66,68,71,72,75,79,80]. tations (d,e); diffuse reflectance (f); diffuse illumination (g); and a non-diffuse residual image that captures view-dependent lighting effects like glossy surfaces and specular highlights (h).…”

Section: Introductionmentioning

confidence: 99%

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Roberts¹,

Ramapuram²,

Ranjan³

et al. 2020

Preprint

View full text Add to dashboard Cite

For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. We address this challenge by introducing Hypersim, a photorealistic synthetic dataset for holistic indoor scene understanding. To create our dataset, we leverage a large repository of synthetic scenes created by professional artists, and we generate 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry. Our dataset: (1) relies exclusively on publicly available 3D assets; (2) includes complete scene geometry, material information, and lighting information for every scene; (3) includes dense perpixel semantic instance segmentations for every image; and (4) factors every image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term that captures view-dependent lighting effects. Together, these features make our dataset well-suited for geometric learning problems that require direct 3D supervision, multi-task learning problems that require reasoning jointly over multiple input and output modalities, and inverse rendering problems. We analyze our dataset at the level of scenes, objects, and pixels, and we analyze costs in terms of money, annotation effort, and computation time. Remarkably, we find that it is possible to generate our entire dataset from scratch, for roughly half the cost of training a state-of-the-art natural language processing model. All the code we used to generate our dataset is available online.

show abstract

Section: Introductionmentioning

confidence: 99%

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Roberts¹,

Ramapuram²,

Ranjan³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…To better show the difference, we compare our dataset with the existing datasets under zero-shot cross-dataset setting. As shown in Table 4 (a), we train our NVDS with existing video depth datasets [33,39,40] on Sintel [6] dataset. With both quantity and diversity, using VDW as the training data yields the best accuracy and consistency.…”

Section: Comparisons With Other Video Depth Methodsmentioning

confidence: 99%

“…Video Depth Datasets According to the scenes of samples, existing video depth datasets can be categorized into closed-domain datasets and natural-scene datasets. Closeddomain datasets only contain samples in certain scenes, e.g., indoor scenes [9,33,39], office scenes [34], and autonomous driving [11]. To enhance the diversity of samples, natural-scene datasets are proposed, which use computerrendered videos [6,40] or crawl stereoscopic videos from YouTube [38].…”

Section: Related Workmentioning

confidence: 99%

“…As shown in Table 1, the proposed VDW dataset has significantly larger numbers of video scenes. Compared with the closed-domain datasets [9,11,33,34,39], the videos of VDW are not restricted to a certain scene, which is more helpful to train a robust video depth model. For the natural-scene datasets, our dataset has more than ten times the number of videos as the previous largest dataset WSVD [38].…”

Section: Vdw Datasetmentioning

confidence: 99%

“…Moreover, we collect a large-scale natural-scene video depth dataset, Video Depth in the Wild (VDW), to support the training of robust learning-based models. Current video depth datasets are mostly closed-domain [9,11,33,34,39]. A few in-the-wild datasets [6,38,40] are still limited in quantity, diversity, and quality, e.g., Sintel [6] only contains 23 animated videos.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

School Bus System Redesign Based on Ergonomics Principles - The example of Huazhong University of Science and Technology

Wang¹,

Xu²

2023

Human Factors and Systems Interaction

View full text Add to dashboard Cite

The optimization design of the school bus system of Huazhong University of Science and Technology, based on ergonomics, is proposed to adapt to the wave of intelligent development in the information era and enhance safety, efficiency, and comfort. In this design, questionnaire interview, literature search and competition product analysis are used to deeply understand the pain points of the current situation, the development status of public transport system and references of school bus user groups and so on, which determined the school bus system’s function design, the CMF design, the product technology and the modelling key point. This design used Global Positioning System, smart touch screen and other technologies, as well as combined the knowledge of ergonomics and perceptual engineering. After the usability test of the product, the school bus user groups thought that the design had a certain effect.

show abstract

Multi-scale progressive fusion-based depth image completion and enhancement for industrial collaborative robot applications

Xian,

Zhang,

Yang

et al. 2024

J Intell Manuf

View full text Add to dashboard Cite

IRS: A Large Naturalistic Indoor Robotics Stereo Dataset to Train Deep Models for Disparity and Surface Normal Estimation

Cited by 9 publications

References 21 publications

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

School Bus System Redesign Based on Ergonomics Principles - The example of Huazhong University of Science and Technology

Multi-scale progressive fusion-based depth image completion and enhancement for industrial collaborative robot applications

Contact Info

Product

Resources

About