Text-Image-Video Summary Generation Using Joint Integer Linear Programming

Jangra, Anubhav; Jatowt, Adam; Hasanuzzaman, Mohammad; Saha, Sriparna

doi:10.1007/978-3-030-45442-5_24

Cited by 15 publications

(28 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Learning process (LP): A lot of work has been done in both supervised learning [13,57,62,133,134] and unsupervised learning [25,26,[44][45][46]58]. It can be observed that a large fraction of supervised techniques adopt deep neural networks to tackle the problem [12,57,62,133], whereas in unsupervised techniques a large diversity of techniques have been adopted including deep neural networks [13], integer linear programming [44], differential evolution [45,46], submodular optimization [58] etc.…”

Section: On the Basis Of Methodsmentioning

confidence: 99%

“…Since a major focus of this survey is on MMS tasks with text as the central modality, the number of text documents in input can also be one way of categorizing the related works. Depending upon whether the textual input is single-document [13,57,133] or multi-document [44][45][46]58], the summarization strategies might differ.…”

Section: Kind Of Input Text (Kit)mentioning

confidence: 99%

“…Kind of text summary (KTS): The most widely discussed distinction for text summarization works is the distinction of extractive vs abstractive. Similarly, depending on the nature of an output text summary, we can also classify the works in MMS tasks (containing text in the output) into extractive MMS [13,[44][45][46]58] and abstractive MMS [12,57,133,134] 6 .…”

Section: Content Intensity (Ci)mentioning

confidence: 99%

“…Base modality (BS): Based on central-modality (defined in Section 2), existing works can also be distinguished depending on the base modality around which the final output as well as the model are formulated. A large potion of the prior work adopts either a text-centric approach [12,44,46,57,58,62,133] or a video-centric 7 approach [25,26,95,115].…”

Section: Content Intensity (Ci)mentioning

confidence: 99%

“…This alarmingly increasing amount of content on the Internet makes it difficult for the users to receive useful information from the torrent of sources, necessitating research on the task of multi-modal summarization (MMS). Various studies have shown that including multi-modal data as input can indeed help improve the summary quality [44,58]. Zhu et al [133] claimed that on an average having a pictorial summary can improve the user satisfaction by 12.4% over a plain text summary.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

A Survey on Multi-modal Summarization

Jangra¹,

Mukherjee²,

Jatowt³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this paper, we present a comprehensive survey of the existing research in the area of MMS.

show abstract