“…For many fundamental scene understanding tasks, it is difficult or impossible to obtain per-pixel ground truth labels from real images. In response to this challenge, the computer vision community has developed several photorealistic synthetic datasets and interactive simulation environments that have spurred rapid progress towards the goal of holistic indoor scene understanding [5,6,8,9,13,14,17,20,22,29,31,34,35,37,41,42,43,44,47,50,57,58,59,61,66,68,71,72,75,79,80]. tations (d,e); diffuse reflectance (f); diffuse illumination (g); and a non-diffuse residual image that captures view-dependent lighting effects like glossy surfaces and specular highlights (h).…”