Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/276
|View full text |Cite
|
Sign up to set email alerts
|

Conditional GAN with Discriminative Filter Generation for Text-to-Video Synthesis

Abstract: Developing conditional generative models for text-to-video synthesis is an extremely challenging yet an important topic of research in machine learning. In this work, we address this problem by introducing Text-Filter conditioning Generative Adversarial Network (TFGAN), a conditional GAN model with a novel multi-scale text-conditioning scheme that improves text-video associations. By combining the proposed conditioning scheme with a deep GAN architecture, TFGAN generates high quality videos from text on challe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
70
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 65 publications
(70 citation statements)
references
References 6 publications
0
70
0
Order By: Relevance
“…Our result again shows the highest inception score among the compared methods. Next, the video classification accuracy of the generated results is recorded in Table 4 following the settings of previous textto-video works [21], [23]. Our TiVGAN achieves the highest performance which is very close to the in-set accuracy.…”
Section: Kineticsmentioning
confidence: 93%
“…Our result again shows the highest inception score among the compared methods. Next, the video classification accuracy of the generated results is recorded in Table 4 following the settings of previous textto-video works [21], [23]. Our TiVGAN achieves the highest performance which is very close to the in-set accuracy.…”
Section: Kineticsmentioning
confidence: 93%
“…Dialogue based interaction is studied to control image synthesis, in order to improve complex scene generation progressively [219]- [223]. Meanwhile, text-to-image synthesis is extended to multiple images or videos, where visual consistency is required among the generated images [224]- [226].…”
Section: ) Other Topicsmentioning
confidence: 99%
“…Text-to-Video Synthesis [34] Medical Imaging [55] Multi-Modal Distributionof Pedestrian Trajectories [64] Feature Filter for EEG [72] Semantic-Image-to-Photo Translation Multiple object tracking in UAV videos [35] Reconstruction of turbulent velocity fields [56] Pronunciation Fluency [65] Feature Filter for EEG [73] Generate New Human Poses…”
Section: Architecture Basedmentioning
confidence: 99%