2019
DOI: 10.48550/arxiv.1901.01342
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
35
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(35 citation statements)
references
References 0 publications
0
35
0
Order By: Relevance
“…In the video domain, a common multi-modal paradigm involves combining representations from both visual and audio features [4,7,21,32,33,36,48]. Such representations have attracted the interest of the com-puter vision community, as they allow exploring new approaches to well established problems, such as person reidentification [32,24,54], audio-visual synchronization [1,8,9], speaker diarization [43,47,58], bio-metrics [33,39], and audio-visual source separation [4,21,36,40,48]. Active speaker detection is a special instance of audiovisual source separation, where sources are the visible persons in a video, and the goal is to detect and assign a segment of speech to one of those candidates.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…In the video domain, a common multi-modal paradigm involves combining representations from both visual and audio features [4,7,21,32,33,36,48]. Such representations have attracted the interest of the com-puter vision community, as they allow exploring new approaches to well established problems, such as person reidentification [32,24,54], audio-visual synchronization [1,8,9], speaker diarization [43,47,58], bio-metrics [33,39], and audio-visual source separation [4,21,36,40,48]. Active speaker detection is a special instance of audiovisual source separation, where sources are the visible persons in a video, and the goal is to detect and assign a segment of speech to one of those candidates.…”
Section: Related Workmentioning
confidence: 99%
“…Active speaker detection is a special instance of audiovisual source separation, where sources are the visible persons in a video, and the goal is to detect and assign a segment of speech to one of those candidates. The selected candidate is known as the active speaker [40].…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations