Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/577
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal Sentence Summarization with Modality Attention and Image Filtering

Abstract: In this paper, we introduce a multi-modal sentence summarization task that produces a short summary from a pair of sentence and image. This task is more challenging than sentence summarization. It not only needs to effectively incorporate visual features into standard text summarization framework, but also requires to avoid noise of image. To this end, we propose a modality-based attention mechanism to pay different attention to image patches and text units, and we design image filters to selectively use visua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
68
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 56 publications
(70 citation statements)
references
References 8 publications
0
68
0
Order By: Relevance
“…Multimodal Attention. To fuse the text and visual context information, we add a multimodal attention layer (Li et al, 2018a), as shown in Fig. 2.…”
Section: Multimodal Attention Modelmentioning
confidence: 99%
“…Multimodal Attention. To fuse the text and visual context information, we add a multimodal attention layer (Li et al, 2018a), as shown in Fig. 2.…”
Section: Multimodal Attention Modelmentioning
confidence: 99%
“…Video summarization [17,28,30] is also a major sub-domain of multi-modal summarization. A few deep learning frameworks [2,11,31] show promising results, too. Li et al [12] uses an asynchronous dataset containing text, images and videos to generate a textual summary.…”
Section: Related Workmentioning
confidence: 99%
“…Summarization can help tackle this problem by distilling the most significant information from the plethora of available content. Recent research in summarization [2,11,31] has proven that having multi-modal data can improve the quality of summary in comparison to uni-modal summaries. Multi-modal information can help users gain deeper insights.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Inspired by the above observations, we propose a model called Attribute-aware Sequence Network (ASN) to consider attribute information into review summarization. Specifically, ASN is based on sequence to sequence models (S2S), which are popular methods in text summarization (Rush et al, 2015;See et al, 2017;Li et al, 2018a) and review summarization (Wang and Ling, 2016;Ma et al, 2018). ASN updates over standard S2S are three-fold.…”
Section: Introductionmentioning
confidence: 99%