Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-1454
|View full text |Cite
|
Sign up to set email alerts
|

Voices Obscured in Complex Environmental Settings (VOiCES) Corpus

Abstract: This paper introduces the Voices Obscured In Complex Environmental Settings (VOICES) corpus, a freely available dataset under Creative Commons BY 4.0. This dataset will promote speech and signal processing research of speech recorded by far-field microphones in noisy room conditions. Publicly available speech corpora are mostly composed of isolated speech at close-range microphony. A typical approach to better represent realistic scenarios, is to convolve clean speech with noise and simulated room response for… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
65
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 87 publications
(66 citation statements)
references
References 5 publications
0
65
0
Order By: Relevance
“…V19-eval and V19-dev: We use the VOiCES data corpus [22] to evaluate the performance of our system with respect to the baselines on a speaker verification task and perform probing tasks to examine the systems. It consists of recordings collected from 4 different rooms with microphones placed at various fixed locations, while a loudspeaker played clean speech samples from the Librispeech [23] dataset.…”
Section: Datasetsmentioning
confidence: 99%
See 2 more Smart Citations
“…V19-eval and V19-dev: We use the VOiCES data corpus [22] to evaluate the performance of our system with respect to the baselines on a speaker verification task and perform probing tasks to examine the systems. It consists of recordings collected from 4 different rooms with microphones placed at various fixed locations, while a loudspeaker played clean speech samples from the Librispeech [23] dataset.…”
Section: Datasetsmentioning
confidence: 99%
“…Vox: Our training data consists of a combination of the development and test splits of VoxCeleb2 [25] and the development split [22]. Distractor represents noise source and green circles represents microphones Refers to number of sessions of VoxCeleb1 [26] datasets.…”
Section: Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…For each data augmentation, we randomly choose from 2000 room impulse responses generated from Pyroomacoustics [21], and add randomly selected background noise from MUSAN [22] and AudioSet [23]. For the test set, we used the VOiCES far-field dataset [4], which we believe captures the essence of challenging channel conditions. For all speech utterances, we use 40-dimension log-mel filterbanks, with 3-second sliding window mean subtraction.…”
Section: Datasets and Augmentationmentioning
confidence: 99%
“…A number of speaker recognition systems based on deep neural network (DNN) embeddings have been reported in the literature [1][2] [3]. More recently, SRI developed the VOiCES dataset [4] specifically for far-field speaker recognition, and showed their DNN embeddings significantly outperformed the i-vector systems [5].…”
Section: Introductionmentioning
confidence: 99%