2020
DOI: 10.48550/arxiv.2001.03063
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

STAViS: Spatio-Temporal AudioVisual Saliency Network

Abstract: We introduce STAViS, a spatio-temporal audiovisual saliency network that combines spatio-temporal visual and auditory information in order to efficiently address the problem of saliency estimation in videos. Our approach employs a single network that combines visual saliency and auditory features and learns to appropriately localize sound sources and to fuse the two saliencies in order to obtain a final saliency map. The network has been designed, trained end-to-end, and evaluated on six different databases th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 54 publications
0
1
0
Order By: Relevance
“…TASED-Net [133] works in two stages, i.e., encoder and prediction networks, respectively. STAViS [134] employs one network to combine spatiotemporal visual and auditory information to generate a final saliency map.…”
Section: Saliency Mapmentioning
confidence: 99%
“…TASED-Net [133] works in two stages, i.e., encoder and prediction networks, respectively. STAViS [134] employs one network to combine spatiotemporal visual and auditory information to generate a final saliency map.…”
Section: Saliency Mapmentioning
confidence: 99%