Crosscast: Adding Visuals to Audio Travel Podcasts

Xia, Haijun; Jacobs, Jennifer; Agrawala, Maneesh

doi:10.1145/3379337.3415882

“…It segments recordings into pieces for each line of dialogue. [18] obtains transcripts from audio using rev.com and spots locations and visually significant entities(VSEs) using Google NLP toolkit 6 . [75] uses titles as import cues to summarize videos.…”

Section: Textmentioning

confidence: 99%

“…For example, automatic speech recognition techniques cannot meet the expectation of professional editors. They ususally require a perfect transcript from video providers or crowdsource [18][19] [20]. The corelations among multi-modality is not well investigated, though some researches have attempted [21].…”

Section: Introductionmentioning

confidence: 99%

AI Video Editing: a Survey

Zhang¹,

Li²,

Han³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video editing is a high-required job, for it requires skilled artists or workers equipped with plentiful physical strength and multidisciplinary knowledge, such as cinematography, aesthetics. Thus gradually, more and more researches focus on proposing semi-automatical and even fully automatical solutions to reduce workloads. Since those conventional methods are usually designed to follow some simple guidelines, they lack flexibility and capability to learn complex ones. Fortunately, the advances of computer vision and machine learning make up the shortages of traditional approaches and make AI editing feasible. There is no survey to conclude those emerging researches yet. This paper summaries the development history of automatic video editing, and especially the applications of AI in partial and full workflows. We emphasizes video editing and discuss related works from multiple aspects: modality, type of input videos, methology, optimization, dataset, and evaluation metric. Besides, we also summarize the progresses in image editing domain, i.e., style transferring, retargeting, and colorization, and seek for the possibility to transfer those techniques to video domain. Finally, we give a brief conclusion about this survey and explore some open problems.

show abstract

“…It segments recordings into pieces for each line of dialogue. [18] obtains transcripts from audio using rev.com and spots locations and visually significant entities(VSEs) using Google NLP toolkit 6 . [75] uses titles as import cues to summarize videos.…”

Section: Textmentioning

confidence: 99%

“…For example, automatic speech recognition techniques cannot meet the expectation of professional editors. They usually require a perfect transcript from video providers or crowdsource [18][19] [20]. The correlations among multi-modality are not well investigated, though some researchers have attempted [21].…”

Section: Introductionmentioning

confidence: 99%

AI Video Editing: a Survey

Zhang¹,

Li²,

Han³

et al. 2022

Preprint

View full text Add to dashboard Cite

Video editing is a high-required job, for it requires skilled artists or workers equipped with plentiful physical strength and multidisciplinary knowledge, such as cinematography, aesthetics. Thus gradually, more and more researches focus on proposing semi-automatical and even fully automatical solutions to reduce workloads. Since those conventional methods are usually designed to follow some simple guidelines, they lack flexibility and capability to learn complex ones. Fortunately, the advances of computer vision and machine learning make up the shortages of traditional approaches and make AI editing feasible. There is no survey to conclude those emerging researches yet. This paper summaries the development history of automatic video editing, and especially the applications of AI in partial and full workflows. We emphasizes video editing and discuss related works from multiple aspects: modality, type of input videos, methology, optimization, dataset, and evaluation metric. Besides, we also summarize the progresses in image editing domain, i.e., style transferring, retargeting, and colorization, and seek for the possibility to transfer those techniques to video domain. Finally, we give a brief conclusion about this survey and explore some open problems.

show abstract

“…Crosscast utilized heuristic-based algorithms to extract relevant information from audio transcripts for travel podcasts, compose search queries, and retrieve relevant visual content to augment audio travel podcast [56]. One limitation of using automatically generated content is that the visual styles of content are limited.…”

Section: Visual Content Generation From Natural Languagementioning

confidence: 99%

Crosspower

Xia

¹

2020

Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology

Self Cite

View full text Add to dashboard Cite

Despite the ubiquity of direct manipulation techniques available in computer-aided design applications, creating digital content remains a tedious and indirect task. This is because applications require users to perform numerous low-level editing operations rather than allowing them to directly indicate high-level design goals. Yet, the creation of graphic content, such as videos, animations, and presentations often begins with a description of design goals in natural language, such as screenplays, scripts, outlines. Therefore, there is an opportunity for languageoriented authoring, i.e., leveraging the information found in the structure of a language to facilitate the creation of graphic content. We present a systematic exploration of the identification, graphic description, and interaction with various linguistic structures to assist in the creation of visual content. The prototype system, Crosspower, and its proposed interaction techniques, enables content creators to indicate and customize their desired visual content in a flexible and direct manner.

show abstract

Crosscast: Adding Visuals to Audio Travel Podcasts

Cited by 33 publications

References 27 publications

AI Video Editing: a Survey

AI Video Editing: a Survey

AI Video Editing: a Survey

Crosspower

Contact Info

Product

Resources

About