VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

Li, Mingzhe; Chen, Xiuying; Gao, Shen; Chan, Zhangming; Zhao, Dongyan; Yan, Rui

doi:10.18653/v1/2020.emnlp-main.752

Cited by 29 publications

(19 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multimodal Summarization. MSMO (Zhu et al 2019(Zhu et al , 2020Li et al 2020) generates textual summarization with related images for news articles. Similarly, our task includes summarizing multimodal documents, but it also involves putting the summary in a structured format such as slides.…”

Section: Related Workmentioning

confidence: 99%

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Wang

McDuff

et al. 2022

AAAI

View full text Add to dashboard Cite

Creating presentation materials requires complex multimodal reasoning skills to summarize key concepts and arrange them in a logical and visually pleasing manner. Can machines learn to emulate this laborious process? We present a novel task and approach for document-to-slide generation. Solving this involves document summarization, image and text retrieval, slide structure and layout prediction to arrange key elements in a form suitable for presentation. We propose a hierarchical sequence-to-sequence approach to tackle our task in an end-to-end manner. Our approach exploits the inherent structures within documents and slides and incorporates paraphrasing and layout prediction modules to generate slides. To help accelerate research in this domain, we release a dataset about 6K paired documents and slide decks used in our experiments. We show that our approach outperforms strong baselines and produces slides with rich content and aligned imagery.

show abstract

Section: Related Workmentioning

confidence: 99%

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Wang

McDuff

et al. 2022

AAAI

View full text Add to dashboard Cite

show abstract

“…Chinese Word Embedding. Different from the English language where words are usually taken as basic semantic units, Chinese words have complicated composition structures revealing their semantic meanings (Li et al, 2020(Li et al, , 2021. More specifically, a Chinese word is often composed of several characters, and most of the characters themselves can be further divided into components such as radicals.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Mitigating Gender Bias by Character Components: A Case Study of Chinese Word Embedding

Chen¹,

Li²,

Yan³

et al. 2022

Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (GeBNLP)

Self Cite

View full text Add to dashboard Cite

Word embeddings learned from massive text collections have demonstrated significant levels of discriminative biases. However, debiasing on the Chinese language, one of the most spoken languages, has been less explored. Meanwhile, existing literature relies on manually created supplementary data, which is time-and energy-consuming. In this work, we propose the first Chinese Gender-neutral word Embedding model (CGE) based on Word2vec, which learns gender-neutral word embeddings without any labeled data. Concretely, CGE utilizes and emphasizes the rich feminine and masculine information contained in radicals, i.e., a kind of component in Chinese characters, during the training procedure. This consequently alleviates discriminative gender biases. Experimental results show that our unsupervised method outperforms the state-of-the-art supervised debiased word embedding models without sacrificing the functionality of the embedding model.

show abstract

“…Multimodal text summarization has been mainly studied in a supervised manner. Text summaries were created by using other modality data as additional input (Li et al, , 2020a, and some studies provided not only a text summary but also other modality information as output Chen and Zhuge, 2018;Zhu et al, 2020;Li et al, 2020b;Fu et al, 2020). Furthermore, most studies summarized a single sentence or document.…”

Section: Related Workmentioning

confidence: 99%

Self-Supervised Multimodal Opinion Summarization

Im¹,

Kim²,

Cho³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Recently, opinion summarization, which is the generation of a summary from multiple reviews, has been conducted in a self-supervised manner by considering a sampled review as a pseudo summary. However, non-text data such as image and metadata related to reviews have been considered less often. To use the abundant information contained in non-text data, we propose a self-supervised multimodal opinion summarization framework called Mul-timodalSum. Our framework obtains a representation of each modality using a separate encoder for each modality, and the text decoder generates a summary. To resolve the inherent heterogeneity of multimodal data, we propose a multimodal training pipeline. We first pretrain the text encoder-decoder based solely on text modality data. Subsequently, we pretrain the non-text modality encoders by considering the pretrained text decoder as a pivot for the homogeneous representation of multimodal data. Finally, to fuse multimodal representations, we train the entire framework in an end-to-end manner. We demonstrate the superiority of MultimodalSum by conducting experiments on Yelp and Amazon datasets.

show abstract

VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles

Cited by 29 publications

References 31 publications

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents

Unsupervised Mitigating Gender Bias by Character Components: A Case Study of Chinese Word Embedding

Self-Supervised Multimodal Opinion Summarization

Contact Info

Product

Resources

About