Procedings of the British Machine Vision Conference 2016 2016
DOI: 10.5244/c.30.111
|View full text |Cite
|
Sign up to set email alerts
|

Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task

Abstract: We present Mean Box Pooling, a novel visual representation that pools over CNN representations of a large number, highly overlapping object proposals. We show that such representation together with nCCA, a successful multimodal embedding technique, achieves state-of-the-art performance on the Visual Madlibs task. Moreover, inspired by the nCCA's objective function, we extend classical CNN+LSTM approach to train the network by directly maximizing the similarity between the internal representation of the deep le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
0
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 18 publications
0
0
0
Order By: Relevance
“…However, a vast range of research in linguistics has shown that acceptance areas of spatial relation terms vary with the situation of use, including the location of nearby objects, knowledge of how objects work, the purpose of the utterance, and the roles of the parties communicating (Stock & Hall, 2018;Stock & Yousaf, 2018). More recent work has incorporated context in limited ways, including object types (Lan et al, 2012;Malinowski & Fritz, 2015;Platonov & Schubert, 2018) or size (Collell et al, 2018), and text embeddings, which are a vector representation of the semantics of the terms in the description (Bisk et al, 2018;Collell et al, 2018;Malinowski & Fritz, 2015). These latter studies are all in so-called 'tabletop' indoor environments or describe locations in images that take no account of the geographical factors characterizing our own area of research.…”
Section: Previous Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, a vast range of research in linguistics has shown that acceptance areas of spatial relation terms vary with the situation of use, including the location of nearby objects, knowledge of how objects work, the purpose of the utterance, and the roles of the parties communicating (Stock & Hall, 2018;Stock & Yousaf, 2018). More recent work has incorporated context in limited ways, including object types (Lan et al, 2012;Malinowski & Fritz, 2015;Platonov & Schubert, 2018) or size (Collell et al, 2018), and text embeddings, which are a vector representation of the semantics of the terms in the description (Bisk et al, 2018;Collell et al, 2018;Malinowski & Fritz, 2015). These latter studies are all in so-called 'tabletop' indoor environments or describe locations in images that take no account of the geographical factors characterizing our own area of research.…”
Section: Previous Workmentioning
confidence: 99%
“…in a factory environment) (Bisk et al, 2018;Platonov & Schubert, 2018), and the retrieval of photographs in response to queries (e.g. find me a photo that shows a boy on a horse) (Collell et al, 2018;Malinowski & Fritz, 2015). In geographical environments, previous work has addressed the generic task of georeferencing, considering only broad urban vs rural contexts (Hall et al, 2011;Hall & Jones, 2021) and characteristics such as scale and geometry type (Stock & Yousaf, 2018), while place size and prominence were addressed in Chen et al (2018).…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation