2018
DOI: 10.1007/978-3-030-01219-9_11
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Unsupervised Image-to-Image Translation

Abstract: Unsupervised image-to-image translation is an important and challenging problem in computer vision. Given an image in the source domain, the goal is to learn the conditional distribution of corresponding images in the target domain, without seeing any examples of corresponding image pairs. While this conditional distribution is inherently multimodal, existing approaches make an overly simplified assumption, modeling it as a deterministic one-to-one mapping. As a result, they fail to generate diverse outputs fr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
2,375
1
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,948 publications
(2,381 citation statements)
references
References 57 publications
4
2,375
1
1
Order By: Relevance
“…where z is the activation of the previous convolutional layer, which is first normalized. Then it is scaled by γ and shifted by β, which are parameters generated by a multilayer perceptron adapted from [10]. Compared to existing colorization models [2,34] that incorporate color conditions via a simple element-wise addition, AdaIN allows the model to produce vivid colorizations as shown in Fig.…”
Section: Colorization Networkmentioning
confidence: 99%
“…where z is the activation of the previous convolutional layer, which is first normalized. Then it is scaled by γ and shifted by β, which are parameters generated by a multilayer perceptron adapted from [10]. Compared to existing colorization models [2,34] that incorporate color conditions via a simple element-wise addition, AdaIN allows the model to produce vivid colorizations as shown in Fig.…”
Section: Colorization Networkmentioning
confidence: 99%
“…For instance, the task of identifying the landscape in images can range from rivers, blue sky, freezing tundras to grasslands, forests, etc. Thus, some works [9,10] have proposed to generate effects such as different seasons artificially to augment the dataset. In this paper, we focus on recognizing natural objects like cat, dog, car, people, etc.…”
Section: Related Workmentioning
confidence: 99%
“…The D s also has 4 residual blocks which is similar to E c i , except that the feature map size is upsampled by 2 with channel number halved after each stage. For image reconstruction, we generally follow the practice in [7]. Specifically, each E a i consists of 5 convolutional layers followed by a global average pooling and a fully-connected layer to obtain the appearance code.…”
Section: Learning Process and Network Architecturementioning
confidence: 99%