Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
2022
DOI: 10.48550/arxiv.2203.01941
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Autoregressive Image Generation using Residual Quantization

Abstract: For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantize… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 25 publications
(49 reference statements)
0
4
0
Order By: Relevance
“…Moreover, a transformer is used to learn the auto-regressive priors for sampling the discrete local latent vector. Building on the previous approaches, RQ-VAE [20] proposes a residual feature quantization framework, which enables their model to work with smaller number of representation vectors. Feature quantization has also been used in the discriminator of GANs to increase the stability of the adversarial training [51].…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, a transformer is used to learn the auto-regressive priors for sampling the discrete local latent vector. Building on the previous approaches, RQ-VAE [20] proposes a residual feature quantization framework, which enables their model to work with smaller number of representation vectors. Feature quantization has also been used in the discriminator of GANs to increase the stability of the adversarial training [51].…”
Section: Related Workmentioning
confidence: 99%
“…In this setup, an image is encoded into a discrete latent representation [38], which improves generation quality when paired with an autoregressive generative prior (most often a transformer [39]). This has led to impressive results in image generation [13,20,43], text-to-image generation [9,25,44], and other tasks. Recent image quantization models improve reconstruction quality by introducing adversarial losses [13], using vision transformer encoders and decoders (ViT) [10,43] as both encoder and decoder, representing discrete codes as a stacked map [20], and more.…”
Section: Related Workmentioning
confidence: 99%
“…This has led to impressive results in image generation [13,20,43], text-to-image generation [9,25,44], and other tasks. Recent image quantization models improve reconstruction quality by introducing adversarial losses [13], using vision transformer encoders and decoders (ViT) [10,43] as both encoder and decoder, representing discrete codes as a stacked map [20], and more. Such quantization architectures typically use powerful CNNs [12] or ViT [43] encoders and decoders; ViT and CNN-based architectures show good performance reconstructing large image datasets; in this paper, we show that our NeRF-based decoder can also work well in the quantization framework.…”
Section: Related Workmentioning
confidence: 99%
“…The first category is models which can fully describe sample characteristics and solve parameters via maximum likelihood estimation, such as the generative model GM [ 29 ], autoencoder AE [ 30 ], variational autoencoder VAE [ 31 ], etc. Lee et al proposed a two-stage framework composed of RQ-VAE and RQ-Transformer to efficiently generate high-resolution images [ 32 ]. The second category is models which can generate samples from unknown data of distribution functions by means of learning, such as the Artificial Neural Network (ANN) [ 33 ] and the Deep Belief Network (DBN) [ 34 ].…”
Section: Related Workmentioning
confidence: 99%