2021
DOI: 10.48550/arxiv.2103.10574
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Hopper: Multi-hop Transformer for Spatiotemporal Reasoning

Honglu Zhou,
Asim Kadav,
Farley Lai
et al.

Abstract: This paper considers the problem of spatiotemporal object-centric reasoning in videos. Central to our approach is the notion of object permanence, i.e., the ability to reason about the location of objects as they move through the video while being occluded, contained or carried by other objects. Existing deep learning based approaches often suffer from spatiotemporal biases when applied to video reasoning problems. We propose Hopper, which uses a Multi-hop Transformer for reasoning object permanence in videos.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 51 publications
0
1
0
Order By: Relevance
“…For example, considering object-centric representations as independent mechanisms helps realize systematic and zero-shot generalization for image generation (Singh et al, 2022;Chen et al, 2021). Also, the ability to decompose a visual observation into a set of discrete knowledge modules has shown to be useful for visual reasoning (Zhou et al, 2021;Wu et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…For example, considering object-centric representations as independent mechanisms helps realize systematic and zero-shot generalization for image generation (Singh et al, 2022;Chen et al, 2021). Also, the ability to decompose a visual observation into a set of discrete knowledge modules has shown to be useful for visual reasoning (Zhou et al, 2021;Wu et al, 2021).…”
Section: Introductionmentioning
confidence: 99%