2018
DOI: 10.1016/j.datak.2018.07.006
|View full text |Cite
|
Sign up to set email alerts
|

Knowledge-rich image gist understanding beyond literal meaning

Abstract: We investigate the problem of understanding the message (gist) conveyed by images and their captions as found, for instance, on websites or news articles. To this end, we propose a methodology to capture the meaning of image-caption pairs on the basis of large amounts of machine-readable knowledge that has previously been shown to be highly effective for text understanding. Our method identifies the connotation of objects beyond their denotation: where most approaches to image understanding focus on the denota… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3
3

Relationship

0
9

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 66 publications
0
4
0
Order By: Relevance
“…Without this knowledge and previous experience, it is not possible to correctly classify or understand the image, because it is a completely new pattern to be recognized. Previously gained experience allows you to compare new observed patterns with previously recognized ones and refer to them in the process of understanding [6]. In the described model of knowledgebased perception, when trying to recognize patterns that are already known but are being shown in an unusual situation (due to the expectations associated with such a pattern), it is also difficult to make a correct classification, because the expectations generated in the cognitive model are completely different.…”
Section: Perceptual Inference Modelmentioning
confidence: 99%
“…Without this knowledge and previous experience, it is not possible to correctly classify or understand the image, because it is a completely new pattern to be recognized. Previously gained experience allows you to compare new observed patterns with previously recognized ones and refer to them in the process of understanding [6]. In the described model of knowledgebased perception, when trying to recognize patterns that are already known but are being shown in an unusual situation (due to the expectations associated with such a pattern), it is also difficult to make a correct classification, because the expectations generated in the cognitive model are completely different.…”
Section: Perceptual Inference Modelmentioning
confidence: 99%
“…They detect if the image and the text make the same point, if one modality is unclear without the other, if the modalities, when considered separately, imply opposing ideas, and if one of the modalities is sufficient to convey the message. Weiland et al (2018) focus on detecting if captions of images contain complementary information. Vempala and Preoţiuc-Pietro (2019) infer relationship categories between the text and image of Twitter posts to see how the meaning of the entire tweet is composed.…”
Section: Related Workmentioning
confidence: 99%
“…Collected multimodal corpora. Recent computational work has examined diverse multimodal corpora collected from in-vivo social processes, e.g., visual/textual advertisements (Hussain et al, 2017;Ye and Kovashka, 2018;, images with non-literal captions in news articles (Weiland et al, 2018), and image/text instructions in cooking how-to documents (Alikhani et al, 2019). In these cases, multimodal classification tasks are often proposed over these corpora as a means of testing different theories from semiotics (Barthes, 1988;O'Toole, 1994;Lemke, 1998;O'Halloran, 2004, inter alia); unlike many VQA-style datasets, they are generally not specifically balanced to force models to learn crossmodal interactions.…”
Section: Related Workmentioning
confidence: 99%