“…The image de-rendering task can also be learned with direct supervision, often using synthetic data, such as ShapeNet [6] objects [28,47], synthetic faces [12,43], nearplanar surfaces [24], indoor scenes [22] or other synthetic objects [13,25,41]. However, generating large-scale realistic synthetic data that capture the level of complexity of the real world is often challenging, and hence it remains questionable how well these methods generalize to real images.…”