Mean Box Pooling: A Rich Image   Representation and Output Embedding for  the Visual Madlibs Task

Malinowski, Mateusz; Mokarian, Ashkan; Fritz, Mario

doi:10.5244/c.30.111

Cited by 2 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, a vast range of research in linguistics has shown that acceptance areas of spatial relation terms vary with the situation of use, including the location of nearby objects, knowledge of how objects work, the purpose of the utterance, and the roles of the parties communicating (Stock & Hall, 2018;Stock & Yousaf, 2018). More recent work has incorporated context in limited ways, including object types (Lan et al, 2012;Malinowski & Fritz, 2015;Platonov & Schubert, 2018) or size (Collell et al, 2018), and text embeddings, which are a vector representation of the semantics of the terms in the description (Bisk et al, 2018;Collell et al, 2018;Malinowski & Fritz, 2015). These latter studies are all in so-called 'tabletop' indoor environments or describe locations in images that take no account of the geographical factors characterizing our own area of research.…”

Section: Previous Workmentioning

confidence: 99%

“…in a factory environment) (Bisk et al, 2018;Platonov & Schubert, 2018), and the retrieval of photographs in response to queries (e.g. find me a photo that shows a boy on a horse) (Collell et al, 2018;Malinowski & Fritz, 2015). In geographical environments, previous work has addressed the generic task of georeferencing, considering only broad urban vs rural contexts (Hall et al, 2011;Hall & Jones, 2021) and characteristics such as scale and geometry type (Stock & Yousaf, 2018), while place size and prominence were addressed in Chen et al (2018).…”

Section: Previous Workmentioning

confidence: 99%

“…Porters Pass). We are also developing classification versions of the models that will predict acceptability of spatial grid squares, again relative to the location of a reference object (see Collell et al, 2018;Malinowski & Fritz, 2015).…”

Section: Georeferencing Of Individual Toponymsmentioning

confidence: 99%

See 2 more Smart Citations

The BioWhere Project: Unlocking the Potential of Biological Collections Data

Stock¹,

Wijegunarathna²,

Jones³

et al. 2023

giforum

View full text Add to dashboard Cite

Vast numbers of biological specimens (e.g. flora, fauna, soils) are stored in collections globally. Many of these have only a natural-language location description, such as '200ft above and south of main highway, 1.1 miles west of Porters Pass', and numerical coordinates are unknown. The BioWhere project is pioneering methods to automatically determine the geographic coordinates (georeferences) of complex location descriptions. Particular challenges are posed by the variable accuracy of recent and historical data that might be used to train models to predict geographic coordinates from the natural-language descriptions; by the presence of historical place names in the descriptions that are not stored in existing gazetteers; and by the vague and context-sensitive nature (e.g. above, on, south of) of the descriptions. We are addressing these challenges by extending the latest transformer-based deep learning models to parse locality descriptions, and to build models for specific spatial terms that incorporate geographic context and data quality to more accurately predict georeferences. We also describe a gazetteer that contains enriched cultural content to support georeferencing of historical records, and to serve as a store of New Zealand Māori cultural knowledge for future generations.

show abstract

Section: Previous Workmentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

See 1 more Smart Citation

The BioWhere Project: Unlocking the Potential of Biological Collections Data

Stock¹,

Wijegunarathna²,

Jones³

et al. 2023

giforum

View full text Add to dashboard Cite

show abstract

Learning Visual Question Answering by Bootstrapping Hard Attention

Malinowski

Doersch

Santoro

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Attention mechanisms in biological perception are thought to select subsets of perceptual information for more sophisticated processing which would be prohibitive to perform on all sensory inputs. In computer vision, however, there has been relatively little exploration of hard attention, where some information is selectively ignored, in spite of the success of soft attention, where information is re-weighted and aggregated, but never filtered out. Here, we introduce a new approach for hard attention and find it achieves very competitive performance on a recently-released visual question answering datasets, equalling and in some cases surpassing similar soft attention architectures while entirely ignoring some features. Even though the hard attention mechanism is thought to be non-differentiable, we found that the feature magnitudes correlate with semantic relevance, and provide a useful signal for our mechanism's attentional selection criterion. Because hard attention selects important features of the input information, it can also be more efficient than analogous soft attention mechanisms. This is especially important for recent approaches that use non-local pairwise operations, whereby computational and memory costs are quadratic in the size of the set of features.

show abstract

Mean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task

Cited by 2 publications

References 18 publications

The BioWhere Project: Unlocking the Potential of Biological Collections Data

The BioWhere Project: Unlocking the Potential of Biological Collections Data

Learning Visual Question Answering by Bootstrapping Hard Attention

Contact Info

Product

Resources

About