a) Inputs (c) PhotoWCT (d) Ours (WCT 2 ) (b) WCT Figure 1: Photorealistic stylization results. Given (a) an input pair (top: content, bottom: style), the results of (b) WCT [20], (c) PhotoWCT [21], and (d) our model are shown. Every result is produced without any post-processing. While WCT and PhotoWCT suffer from spatial distortions, our model successfully transfers the style and preserves the fine details.
AbstractRecent style transfer models have provided promising artistic results. However, given a photograph as a reference style, existing methods are limited by spatial distortions or unrealistic artifacts, which should not happen in real photographs. We introduce a theoretically sound correction to the network architecture that remarkably enhances photorealism and faithfully transfers the style. The key ingredient of our method is wavelet transforms that naturally fits in deep networks. We propose a wavelet corrected transfer based on whitening and coloring transforms (WCT 2 ) that allows features to preserve their structural information and statistical properties of VGG feature space during stylization. This is the first and the only end-to-end model that can stylize a 1024×1024 resolution image in 4.7 seconds, giving a pleasing and photorealistic quality without any postprocessing. Last but not least, our model provides a stable video stylization without temporal constraints. Our code, generated images, and pre-trained models are all available at ClovaAI/WCT2.