TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

Kim, Doyeon; Joo, Donggyu; Kim, Junmo

doi:10.1109/access.2020.3017881

Cited by 44 publications

(17 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To maintain frame-level and video level coherence two Discriminators were used. Kim et al [82] came up with Text-to-Image-to-Video Generative Adversarial Network (TiVGAN) for text-based video generation. The key idea is to begin by learning text-to-single-image generation, then gradually increase the number of images produced, and repeat until the desired video length is reached.…”

Section: Synthesizing Videos Usingmentioning

confidence: 99%

“…To have stable training and prevent problems of mode collapse the authors used Wasserstein GAN with gradient penalty (WGAN-GP) [52] loss and the technique of progressively growing GAN or ProGAN [78] which has been shown to generate high resolution images. VPGAN [66] Video generation and prediction GAN models Unconditional video generation VGAN [165], TGAN [140], FTGAN [125], MoCoGAN [160], DVD-GAN [24] Conditional video generation VAE-GAN [103], TGANs-C [140], TFGAN [9], Sto-ryGAN [102], BoGAN [18], TiVGAN [82], LSTM and cGAN [71], RNN based GAN [167], TemporalGAN [195]…”

Section: Video Predictionmentioning

confidence: 99%

See 1 more Smart Citation

A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines -- From Medical to Remote Sensing

Dash¹,

Ye²,

Wang³

2021

Preprint

View full text Add to dashboard Cite

We look into Generative Adversarial Network (GAN), its prevalent variants and applications in a number of sectors. GANs combine two neural networks that compete against one another using zero-sum game theory, allowing them to create much crisper and discrete outputs. GANs can be used to perform image processing, video generation and prediction, among other computer vision applications.GANs can also be utilised for a variety of science-related activities, including protein engineering, astronomical data processing, remote sensing image dehazing, and crystal structure synthesis. Other notable fields where GANs have made gains include finance, marketing, fashion design, sports, and music. Therefore in this article we provide a comprehensive overview of the applications of GANs in a wide variety of disciplines. We first cover the theory supporting GAN, GAN variants, and the metrics to evaluate GANs.Then we present how GAN and its variants can be applied in twelve domains, ranging from STEM fields, such as astronomy and biology, to business fields, such as marketing and finance, and to arts, such as music. As a result, researchers from other fields may grasp how GANs work and apply them to their own study. To the best of our knowledge, this article provides the most comprehensive survey of GAN's applications in different field. CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies → Computer vision; Neural networks; • Theory of computation → Machine learning theory.

show abstract

Section: Synthesizing Videos Usingmentioning

confidence: 99%

Section: Video Predictionmentioning

confidence: 99%

A review of Generative Adversarial Networks (GANs) and its applications in a wide variety of disciplines -- From Medical to Remote Sensing

Dash¹,

Ye²,

Wang³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Generic video synthesis methods generate videos by sampling from a random distribution [56,57]. To get more control over the generated content, conditional video synthesis works utilize input signals, such as images [7,17], text or language [5,28], and action classes [62]. This enables synthesized videos containing the desired objects as specified by visual information or desired actions as specified by textual information.…”

Section: Introductionmentioning

confidence: 99%

“…Existing works on conditional video generation use only one of the possible control signals as inputs [6,28]. This limits the flexibility and quality of the generative process.…”

Section: Introductionmentioning

confidence: 99%

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Han¹,

Ren²,

Lee³

et al. 2022

Preprint

View full text Add to dashboard Cite

Most methods for conditional video synthesis use a single modality as the condition. This comes with major limitations. For example, it is problematic for a model conditioned on an image to generate a specific motion trajectory desired by the user since there is no means to provide motion information. Conversely, language information can describe the desired motion, while not precisely defining the content of the video. This work presents a multimodal video generation framework that benefits from text and images provided jointly or separately. We leverage the recent progress in quantized representations for videos and apply a bidirectional transformer with multiple modalities as inputs to predict a discrete video representation. To improve video quality and consistency, we propose a new video token trained with self-learning and an improved mask-prediction algorithm for sampling video tokens. We introduce text augmentation to improve the robustness of the textual representation and diversity of generated videos. Our framework can incorporate various visual modalities, such as segmentation masks, drawings, and partially occluded images. It can generate much longer sequences than the one used for training. In addition, our model can extract visual information as suggested by the text prompt, e.g., "an object in image one is moving northeast", and generate corresponding videos. We run evaluations on three public datasets and a newly collected dataset labeled with facial attributes, achieving state-of-the-art generation results on all four 1 .

show abstract

“…Recently there has been interest in the generative modeling community in reconstructing videos from lower bitrate alternatives such as text or low dimensional latent spaces [23,26,42]. While there has been significant progress on using generative machine learning to model natural images from text [9,27,31], these approaches are currently unable to produce high-quality videos.…”

Section: Introductionmentioning

confidence: 99%