Face hallucination has been well studied in the last decade because of its useful applications in law enforcement and entertainment. Promising results on the problem of sketch-photo face hallucination have been achieved with classic, and increasingly deep learning-based methods. However, synthesized photos still lack the crisp fidelity of real photos. More importantly, good results have primarily been demonstrated on very constrained datasets where the style variability is very low, and crucially the sketches are perfectly align-able traces of the ground-truth photos. However, realistic applications in entertainment or law enforcement require working with more unconstrained sketches drawn from memory or description, which are not rigidly align-able. In this paper, we develop a new deep learning approach to address these settings. Our image-image regression network is trained with a combination of content and adversarial losses to generate crisp photorealistic images, and it contains an integrated spatial transformer network to deal with non-rigid alignment between the domains. We evaluate face synthesis on classic constrained, as well as unviewed, benchmarks namely CUHK, MGDB, and FSMD. The results qualitatively and quantitatively outperform existing approaches.
2HU, LI, SONG, HOSPEDALES: DEEP FACE HALLUCINATION FOR UNVIEWED SKETCHESThe standard viewed-sketch databases are also very constrained, in that there is little variability in conditions such as background, sketch style, and even subject ethnicity (CUHK). However, neither of these assumptions hold in real law or entertainment applications of sketch-photo synthesis. Here, the sketches and photos are more unconstrained, and crucially artists are drawing from their imagination, or description. This means that the sketches are affected by communication and memory imperfections [3,13] as well as the conventional sketch-photo modality gap. So photo hallucination is now a much more complicated mapping than simple colour texturing after rigid alignment. This can be seen in the results of the few studies that test on unviewed forensic sketches after training on viewed benchmarks: The quality of the synthesis results in the unviewed case is much worse [5,14].In this paper we develop a powerful deep learning-based method for sketch-photo face hallucination that produces more crisp images than prior work while addressing the less constrained unviewed setting, that is harder but more practically relevant. We build upon a fully convolutional image-image regression network [5] that can provide a rich non-linear mapping from sketches to photos. To make this mapping learnable, given the lack of a rigid alignment between photos and sketches in the unviewed case, we integrate a modified spatial transformer network (STN) [10] into the regressor. Our STN network inputs facial geometry defined by detected facial interest points, and non-rigidly warps the sketch and photo into alignment. To enable the synthesis of high fidelity crisp photos, we first extend the imageimage regres...