Txt2Vid: Ultra-Low Bitrate Compression of Talking-Head Videos via Text

Tandon, Pulkit; Chandak, Shubham; Pataranutaporn, Pat; Liu, Yimeng; M., Mapuranga, Anesu; Maes, Pattie; Weissman, Tsachy; Sra, Misha

doi:10.48550/arxiv.2106.14014

Cited by 1 publication

(2 citation statements)

References 25 publications

(30 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Talking head generation works can be broadly classified in three categories based on the type of input they use to generate a talking head: Text-driven [16,33,36], Audio-driven [9,13,18,31,37,43,45], and Video-driven [12,27,29,39,44] Talking Head Generation.…”

Section: Related Workmentioning

confidence: 99%

“…Inspired by the recent success of GANs in generating static faces from text [38], Li et al [16] proposed a method to use text for driving animation parameters of the mouth, upper face and head. Txt2Vid [33] converts the spoken language and facial webcam data into text and transmits it to achieve lowbandwidth video conferencing using talking head generation. However, this method relies heavily on the generated speech, altering the original speaker's voice, prosody, and head movements in the video call.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Audio-Visual Face Reenactment

Agarwal¹,

Mukhopadhyay²,

Namboodiri³

et al. 2022

Preprint

View full text Add to dashboard Cite

Figure 1: We propose AVFR-GAN, a novel method for face reenactment. Our network takes a source identity, a driving frame, and a small audio chunk associated with the driving frame to animate the source identity according to the driving frame. Our network generates highly realistic outputs compared to previous works like [29] and [30]. Results from our network contain significantly fewer artifacts and handle things like mouth movements, eye movements, etc. in a better manner.

show abstract