Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.752
|View full text |Cite
|
Sign up to set email alerts
|

VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

Abstract: A popular multimedia news format nowadays is providing users with a lively video and a corresponding news article, which is employed by influential news media including CNN, BBC, and social media including Twitter and Weibo. In such a case, automatically choosing a proper cover frame of the video and generating an appropriate textual summary of the article can help editors save time, and readers make the decision more effectively. Hence, in this paper, we propose the task of Videobased Multimodal Summarization… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
19
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 29 publications
(19 citation statements)
references
References 31 publications
0
19
0
Order By: Relevance
“…Multimodal Summarization. MSMO (Zhu et al 2019(Zhu et al , 2020Li et al 2020) generates textual summarization with related images for news articles. Similarly, our task includes summarizing multimodal documents, but it also involves putting the summary in a structured format such as slides.…”
Section: Related Workmentioning
confidence: 99%
“…Multimodal Summarization. MSMO (Zhu et al 2019(Zhu et al , 2020Li et al 2020) generates textual summarization with related images for news articles. Similarly, our task includes summarizing multimodal documents, but it also involves putting the summary in a structured format such as slides.…”
Section: Related Workmentioning
confidence: 99%
“…Chinese Word Embedding. Different from the English language where words are usually taken as basic semantic units, Chinese words have complicated composition structures revealing their semantic meanings (Li et al, 2020(Li et al, , 2021. More specifically, a Chinese word is often composed of several characters, and most of the characters themselves can be further divided into components such as radicals.…”
Section: Related Workmentioning
confidence: 99%
“…Multimodal text summarization has been mainly studied in a supervised manner. Text summaries were created by using other modality data as additional input (Li et al, , 2020a, and some studies provided not only a text summary but also other modality information as output Chen and Zhuge, 2018;Zhu et al, 2020;Li et al, 2020b;Fu et al, 2020). Furthermore, most studies summarized a single sentence or document.…”
Section: Related Workmentioning
confidence: 99%