Object Permanence Through Audio-Visual Representations

Bu, Fanjun; Huang, Chien-Ming

doi:10.48550/arxiv.2010.09948

Search citation statements

Order By: Relevance

Paper Sections

Select...

Related Work1

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2021

Publication Types

Select...

Other1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(1 citation statement)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are works that, given vision, enhance sounds [30,18], fill in missing sounds [42], and generate sounds entirely from video [32,43]. Further, there have been recent works in integrating vision and sound to improve recognition of environmental properties [3,21,8] and object properties, such as geometry and materials [40,39]. Lastly, there have been works in using audiovisual data for representation learning [33,4,28].…”

Section: Related Workmentioning

confidence: 99%

The Boombox: Visual Reconstruction from Acoustic Vibrations

Chen¹,

Chiquier²,

Lipson³

et al. 2021

Preprint

View full text Add to dashboard Cite

We introduce The Boombox, a container that uses acoustic vibrations to reconstruct an image of its inside contents. When an object interacts with the container, they produce small acoustic vibrations. The exact vibration characteristics depend on the physical properties of the box and the object. We demonstrate how to use this incidental signal in order to predict visual structure. After learning, our approach remains effective even when a camera cannot view inside the box. Although we use low-cost and low-power contact microphones to detect the vibrations, our results show that learning from multi-modal data enables us to transform cheap acoustic sensors into rich visual sensors. Due to the ubiquity of containers, we believe integrating perception capabilities into them will enable new applications in human-computer interaction and robotics.

show abstract