2020
DOI: 10.1016/j.csl.2019.101027
|View full text |Cite
|
Sign up to set email alerts
|

Voxceleb: Large-scale speaker verification in the wild

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
284
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 386 publications
(286 citation statements)
references
References 17 publications
2
284
0
Order By: Relevance
“…The recording condition and audio quality are less than ideal, but, this corpus is suitable for training speaker encoder networks or generalizing any-to-any speaker mapping network. The VoxCeleb database [306] is further a larger scale speech database consisting of about 2,800 hours of untranscribed speech from over 6,000 speakers. This is an appropriate database for training noise-robust speaker encoder networks.…”
Section: Resourcesmentioning
confidence: 99%
“…The recording condition and audio quality are less than ideal, but, this corpus is suitable for training speaker encoder networks or generalizing any-to-any speaker mapping network. The VoxCeleb database [306] is further a larger scale speech database consisting of about 2,800 hours of untranscribed speech from over 6,000 speakers. This is an appropriate database for training noise-robust speaker encoder networks.…”
Section: Resourcesmentioning
confidence: 99%
“…Stimuli for the speech and non-speech contrast were extracted from large popular datasets for these categories. Speech stimuli were extracted from a human speech-utterance dataset comprising short audio clips of interviews recorded on YouTube (52). Non-speech stimuli were extracted from another large dataset comprising short clips of environmental sounds (53).…”
Section: Stimuli For Synthetic Contrastsmentioning
confidence: 99%
“…All utterances crucially degraded with different types of noises including background chatter, laughter, overlapping speech and room acoustics. Although there are a lot of variations in recording devices and channels [18].…”
Section: Corpusmentioning
confidence: 99%