Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1129
|View full text |Cite
|
Sign up to set email alerts
|

The Speakers in the Wild (SITW) Speaker Recognition Database

Abstract: The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology on single and multi-speaker audio acquired across unconstrained or "wild" conditions. The database consists of recordings of 299 speakers, with an average of eight different sessions per person. Unlike existing databases for speaker recognition, this data was not collected under controlled conditions and thus c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
161
0
1

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 229 publications
(163 citation statements)
references
References 6 publications
1
161
0
1
Order By: Relevance
“…VoxCeleb: The entire dataset involves two parts: VoxCeleb1 and VoxCeleb2. We used SITW [22], a subset of VoxCeleb1 as the evaluation set. The rest of VoxCeleb1 was merged with VoxCeleb2 to form the training set (simply denoted by Vox-Celeb).…”
Section: Datamentioning
confidence: 99%
“…VoxCeleb: The entire dataset involves two parts: VoxCeleb1 and VoxCeleb2. We used SITW [22], a subset of VoxCeleb1 as the evaluation set. The rest of VoxCeleb1 was merged with VoxCeleb2 to form the training set (simply denoted by Vox-Celeb).…”
Section: Datamentioning
confidence: 99%
“…We report our results using metrics Equal Error Rate (EER) in % and DCF (Detection Cost Function) [21] under two testing conditions of SITW corpus: Core-Core and Assist-Multi [2]. We refer to the adaptation system trained with LT data as Adaptation system LT.…”
Section: Results For Mic-tel Adaptationmentioning
confidence: 99%
“…Speaker recognition technology has made great progress in the last decade. The x-vector approach [1] is the current state-of-the-art in this field, providing superior performance in NIST SRE, Speakers In The Wild (SITW) [2] and Vox-Celeb datasets [3]. x-vectors is a data-hungry approach, i.e., it requires a huge amount of labeled data (∼ 10k speakers with multiple recordings per speaker) to be properly trained.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…SITW-Eval.Core: A standard free database collected by [23] for ASV evaluation. It was collected from open-source media channels, and consists of speech data covering 299 well-known persons.…”
Section: A Datamentioning
confidence: 99%