2017
DOI: 10.1007/s11042-017-5151-6
|View full text |Cite
|
Sign up to set email alerts
|

Answering why-not questions on semantic multimedia queries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 28 publications
0
3
0
Order By: Relevance
“…A common way to define the visual relationships in an image with a triplet, where two objects are connected by a predicate -for example, <person-adjacent to-bike> or <clock-attach to-building>. Identifying these relationships is useful for a wide range of image understanding tasks, such as captioning (Fang et al 2015), retrieval (Johnson et al 2015), reasoning (Shi, Zhang, and Li 2019;Wang et al 2018), and visual question answering (Xiong, Merity, and Socher 2016). Conventional models for automatically detecting these relationships typically require a relatively large number of training instances to determine the predicates.…”
Section: Introductionmentioning
confidence: 99%
“…A common way to define the visual relationships in an image with a triplet, where two objects are connected by a predicate -for example, <person-adjacent to-bike> or <clock-attach to-building>. Identifying these relationships is useful for a wide range of image understanding tasks, such as captioning (Fang et al 2015), retrieval (Johnson et al 2015), reasoning (Shi, Zhang, and Li 2019;Wang et al 2018), and visual question answering (Xiong, Merity, and Socher 2016). Conventional models for automatically detecting these relationships typically require a relatively large number of training instances to determine the predicates.…”
Section: Introductionmentioning
confidence: 99%
“…where i and j represent the positions of the corresponding pixels in the image; (4) According to the linear equation of the acceleration vector, the intersection point P = {p 1 , p 2 , … p s } is calculated. The main calculation is the intersection of all the two straight lines, and the mathematical derivation is as follows [18] :…”
Section: Anomaly Localization Algorithm Based On Single Escape Centermentioning
confidence: 99%
“…Equivalently, a scene graph can also be represented as a set of triplets <subject-relation-object>. Scene graph generation is useful for a wide range of image understanding tasks, such as captioning [6,[9][10][11]41], retrieval [8,16], reasoning [26,31], multimodal knowledge graph [32], and visual question answering [33,37,38,50]. It bridges the gap between visual perception and high-level cognition.…”
Section: Introductionmentioning
confidence: 99%