2020
DOI: 10.48550/arxiv.2003.10469
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Object Permanence from Video

Abstract: Object Permanence allows people to reason about the location of non-visible objects, by understanding that they continue to exist even when not perceived directly. Object Permanence is critical for building a model of the world, since objects in natural visual scenes dynamically occlude and contain each-other. Intensive studies in developmental psychology suggest that object permanence is a challenging task that is learned through extensive experience.Here we introduce the setup of learning Object Permanence f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(6 citation statements)
references
References 33 publications
(39 reference statements)
0
6
0
Order By: Relevance
“…The number of attention heads for the Transformer (and Hopper-transformer) was set to 2, the number of transformer layers was set to 5 to match the 5 hops in our Multi-hop Transformer, and the Transformer dropout rate was set to 0.1. For OPNet related experiments, we used the implementation provided from authors (Shamsian et al, 2020). We verified we could reproduce their results under 24 FPS on CATER by using their provided code and trained models.…”
Section: H3 Baselinesmentioning
confidence: 99%
See 4 more Smart Citations
“…The number of attention heads for the Transformer (and Hopper-transformer) was set to 2, the number of transformer layers was set to 5 to match the 5 hops in our Multi-hop Transformer, and the Transformer dropout rate was set to 0.1. For OPNet related experiments, we used the implementation provided from authors (Shamsian et al, 2020). We verified we could reproduce their results under 24 FPS on CATER by using their provided code and trained models.…”
Section: H3 Baselinesmentioning
confidence: 99%
“…Outputs from DETR are transformed object representations that are used as inputs to a multilayer perceptron (MLP) to predict the bounding box and class label of every object. For Snitch Localization, DETR is trained on object annotations from LA-CATER (Shamsian et al, 2020).…”
Section: Object Detection and Representationmentioning
confidence: 99%
See 3 more Smart Citations