2022
DOI: 10.1016/j.neucom.2022.04.126
|View full text |Cite
|
Sign up to set email alerts
|

A survey on multimodal-guided visual content synthesis

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(1 citation statement)
references
References 49 publications
0
1
0
Order By: Relevance
“…A deep network effectively learns the representations of visual content and captures complex patterns within the data. It also enables end-toend learning by mapping the source data into the expected target without the need for handcrafted features as was the case with prior techniques [5], [17]. AlignDraw [18] and Speech2V id [19] are the two deep-learning-based pioneering research in text-to-vision and audio-to-vision generation, respectively.…”
Section: Introductionmentioning
confidence: 99%
“…A deep network effectively learns the representations of visual content and captures complex patterns within the data. It also enables end-toend learning by mapping the source data into the expected target without the need for handcrafted features as was the case with prior techniques [5], [17]. AlignDraw [18] and Speech2V id [19] are the two deep-learning-based pioneering research in text-to-vision and audio-to-vision generation, respectively.…”
Section: Introductionmentioning
confidence: 99%