“…However, for this technology to be used in gaming, simulators, and virtual applications, they need to be fully controllable. Recent work controls image generation by conditioning output on different type of inputs such as natural and synthetic images [17,50,23,46,7,28], landmarks [27,39,8], and semantic maps [14,40,31]. Among these methods, [27,8,15] disentangle images into pose and appearance in an unsupervised way, and show control over pose during inference.…”