2019
DOI: 10.1145/3306346.3323028
|View full text |Cite
|
Sign up to set email alerts
|

Text-based editing of talking-head video

Abstract: large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
140
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
4
1

Relationship

1
9

Authors

Journals

citations
Cited by 238 publications
(153 citation statements)
references
References 67 publications
0
140
0
1
Order By: Relevance
“…Not only visual part, Suwajanakorn et al [9] presented a method for learning the mapping between speech and lip movements in which speech can also be synthesized, enabling creation of a full-function spoof video. Fried et al [34] demonstrated that speech can be easily modified in any video in accordance with the intention of the manipulator while maintaining a seamless audio-visual flow. Averbuch-Elor et al [8] addressed a different problem -converting still portraits into motion pictures expressing various emotions.…”
Section: Face Manipulationmentioning
confidence: 99%
“…Not only visual part, Suwajanakorn et al [9] presented a method for learning the mapping between speech and lip movements in which speech can also be synthesized, enabling creation of a full-function spoof video. Fried et al [34] demonstrated that speech can be easily modified in any video in accordance with the intention of the manipulator while maintaining a seamless audio-visual flow. Averbuch-Elor et al [8] addressed a different problem -converting still portraits into motion pictures expressing various emotions.…”
Section: Face Manipulationmentioning
confidence: 99%
“…Despite those technical solutions to detect synthetic media and approaches to educate humans on detecting machine manipulated media (Groh et al 2019), a further, quite strict idea is to limit the availability of trained generative models. Against this background, it is astounding how unquestioningly papers have been published in recent years, in which leap innovations in the generation of fake media, especially videos, are described-although many research groups, for instance, the one behind Face2Face, did not release their code (Fried et al 2019;Ovadya and Whittlestone 2019;Thies et al 2015Thies et al , 2016Thies et al , 2018Thies et al , 2019. Synthetic videos, no matter if they are generated through Face2Face, DeepFakes, FaceSwap or NeuralTextures, can have all sorts of negative consequences, from harm to individuals, national security, to the economy and democracy (Chesney and Citron 2018).…”
Section: Synthetic Mediamentioning
confidence: 99%
“…Prior transcript-based audio editing tools use time-aligned text transcripts of spoken audio to automatically group similar sentences, highlight repeated words, and maintain synchronization between multiple speakers [33], support automatic alignment of music with spoken audio [32,31], or enable linked editing between script writing and audio recording and editing [36]. Transcript-based video production systems analyze time-aligned video transcripts to identify points for inserting [14] or removing footage [1], allow for vocally-annotating raw footage [27,41], or enable the synthesis of short segments of talking-head video of puppets [3] and people [4]. Other systems use script transcript analysis to select relevant video clips [18], or leverage linguistic structures to create corresponding graphical structures [45].…”
Section: Transcript-based Audio and Video Editingmentioning
confidence: 99%