Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis

Balaji, Yogesh; Min, Martin Renqiang; Bai, Bing; Chellappa, Rama; Graf, Hans Peter

doi:10.24963/ijcai.2019/276

Cited by 65 publications

(70 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our result again shows the highest inception score among the compared methods. Next, the video classification accuracy of the generated results is recorded in Table 4 following the settings of previous textto-video works [21], [23]. Our TiVGAN achieves the highest performance which is very close to the in-set accuracy.…”

Section: Kineticsmentioning

confidence: 93%

TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

2020

View full text Add to dashboard Cite

Advances in technology have led to the development of methods that can create desired visual multimedia. In particular, image generation using deep learning has been extensively studied across diverse fields. In comparison, video generation, especially on conditional inputs, remains a challenging and less explored area. To narrow this gap, we aim to train our model to produce a video corresponding to a given text description. We propose a novel training framework, Text-to-Image-to-Video Generative Adversarial Network (TiVGAN), which evolves frame-by-frame and finally produces a full-length video. In the first phase, we focus on creating a high-quality single video frame while learning the relationship between the text and an image. As the steps proceed, our model is trained gradually on more number of consecutive frames. This step-by-step learning process helps stabilize the training and enables the creation of highresolution video based on conditional text descriptions. Qualitative and quantitative experimental results on various datasets demonstrate the effectiveness of the proposed method.

show abstract

Section: Kineticsmentioning

confidence: 93%

TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

2020

View full text Add to dashboard Cite

show abstract

“…Dialogue based interaction is studied to control image synthesis, in order to improve complex scene generation progressively [219]- [223]. Meanwhile, text-to-image synthesis is extended to multiple images or videos, where visual consistency is required among the generated images [224]- [226].…”

Section: ) Other Topicsmentioning

confidence: 99%

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Zhang

Yang

et al. 2020

IEEE J. Sel. Top. Signal Process.

222

View full text Add to dashboard Cite

Deep learning has revolutionized speech recognition, image recognition, and natural language processing since 2010, each involving a single modality in the input signal. However, many applications in artificial intelligence involve more than one modality. It is therefore of broad interest to study the more difficult and complex problem of modeling and learning across multiple modalities. In this paper, a technical review of the models and learning methods for multimodal intelligence is provided. The main focus is the combination of vision and natural language, which has become an important area in both computer vision and natural language processing research communities.This review provides a comprehensive analysis of recent work on multimodal deep learning from three new angles -learning multimodal representations, the fusion of multimodal signals at various levels, and multimodal applications. On multimodal representation learning, we review the key concept of embedding, which unifies the multimodal signals into the same vector space and thus enables cross-modality signal processing. We also review the properties of the many types of embedding constructed and learned for general downstream tasks. On multimodal fusion, this review focuses on special architectures for the integration of the representation of unimodal signals for a particular task. On applications, selected areas of a broad interest in current literature are covered, including caption generation, text-to-image generation, and visual question answering. We believe this review can facilitate future studies in the emerging field of multimodal intelligence for the community.

show abstract

“…Text-to-Video Synthesis [34] Medical Imaging [55] Multi-Modal Distributionof Pedestrian Trajectories [64] Feature Filter for EEG [72] Semantic-Image-to-Photo Translation Multiple object tracking in UAV videos [35] Reconstruction of turbulent velocity fields [56] Pronunciation Fluency [65] Feature Filter for EEG [73] Generate New Human Poses…”

Section: Architecture Basedmentioning

confidence: 99%

Generative Adversarial Networks : A Survey

kakkar¹,

Singh²

2021

Preprint

View full text Add to dashboard Cite

Generative Modelling has been a very extensive area of research since it finds immense use cases across multiple domains. Various models have been proposed in the recent past including Fully Visible Belief Nets, NADE, MADE, Pixel RNN Variational Auto Encoders, Markov Chain, and Generative Adversarial Networks. Amongst all the models, Generative Adversarial Networks have been consistently showing huge potential and developments in the area of Art, Music, SemiSupervised learning, Handling Missing data, Drug Discovery, and unsupervised learning. This emerging technology has reshaped the research landscape in the field of generative modeling. The research in the area of Generative Adversarial Networks (GANs) was introduced by Ian J. Goodfellow et al in 2014 [1]. However, since its inception, various models have been proposed over the years and are considered state-of-the-art models in generative modeling. In this survey, we provide a comprehensive review of the original GAN model and its modified versions, covering broad topics on model architecture, Objective and Loss Functions, and applications. First, we summarize different architectures proposed along with Objective Functions and loss functions used. Second, we cover the evolution of GAN followed by summarizing comparative analysis for various GANs. Then, we review various applications proposed by various authors which are built over these architectures in different domains. Finally, the technical challenges and several promising directions are highlighted.

show abstract

Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis

Cited by 65 publications

References 6 publications

TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

TiVGAN: Text to Image to Video Generation With Step-by-Step Evolutionary Generator

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications

Generative Adversarial Networks : A Survey

Contact Info

Product

Resources

About