“…The pipeline of our network is shown in Figure 3. The generator G adopts the improved backbone proposed in iDIH [30], where the image blending layer is added to the UNet-like architecture. Since both input space and output space are different across two domains, we split the first (resp., last) 4 layers in the encoder (resp., decoder) into E rd (resp., D rd ) and E rl (resp., D rl ), where E rd (resp., E rl ) and D rd (resp., D rl ) are the domain-specific encoder and decoder for the rendered (resp., real) image domain.…”