Sentence-based Image Editing (SIE) aims to deploy natural language to edit an image. Offering potentials to reduce expensive manual editing, SIE has attracted much interest recently. However, existing methods can hardly produce accurate editing and even lead to failures in attribute editing when the query sentence is with multiple editable attributes. To cope with this problem, by focusing on enhancing the difference between attributes, this paper proposes a novel model called Contrastive Attention Generative Adversarial Network (CA-GAN), which is inspired from contrastive training. Specifically, we first design a novel contrastive attention module to enlarge the editing difference between random combinations of attributes which are formed during training. We then construct an attribute discriminator to ensure effective editing on each attribute. A series of experiments show that our method can generate very encouraging results in sentencebased image editing with multiple attributes on CUB and COCO dataset. Our code is available at https://github.com/Zlq2021/CA-GAN
IntroductionAs billions of images are uploaded and shared every day [16,32], image editing has become one of the most demanding tasks in social media. However, to edit an image as desired, one may have to master professional software such as Adobe PhotoShop. In contrast with manual editing, automatic image editing, has recently attracted much interest in computer vision. This paper studies the problem of Sentence-based Image Editing (SIE) [7,19,25] that intends to deploy natural language to assist image editing automatically. One main