Interspeech 2019 2019
DOI: 10.21437/interspeech.2019-2441
|View full text |Cite
|
Sign up to set email alerts
|

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Abstract: This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use. It is derived from the original audio and text materials of the LibriSpeech corpus, which has been used for training and evaluating automatic speech recognition systems. The new corpus inherits desired properties of the LibriSpeech corpus while addressing a number of issues which make LibriSpeech less than ideal for text-to-speech work. The released corpus consists of 585 hours of speech data at 24kHz sampling rate fro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
181
0
2

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 423 publications
(198 citation statements)
references
References 27 publications
0
181
0
2
Order By: Relevance
“…Some utterances start or end in the middle of a sentence, leading to unnatural pronounciation at the beginning and end of utterances. These problems were also adressed in [25]. To remove unnatural pauses and long pauses in general, we apply the FFMPEG silenceremove filter 2 with a threshold of -40dB.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Some utterances start or end in the middle of a sentence, leading to unnatural pronounciation at the beginning and end of utterances. These problems were also adressed in [25]. To remove unnatural pauses and long pauses in general, we apply the FFMPEG silenceremove filter 2 with a threshold of -40dB.…”
Section: Data Preprocessingmentioning
confidence: 99%
“…Here we explore the impact of training Conv-TasNet and the deep encoder/decoder on a larger, more diverse, training set: LibriTTS [30]. Our goal is to compare the SI-SNRi performance of these two architectures when using the WSJ and LibriTTS datasets for trainingand the WSJ, LibriTTS, and VCTK [31] datasets for evaluation.…”
Section: Cross-dataset Evaluationmentioning
confidence: 99%
“…We train our models using the LJSpeech (LJS) dataset [16], the Sally dataset, a proprietary single speaker dataset with 20 hours, and a subset of LibriTTS [17]. All datasets used in our experiments are from read speech.…”
Section: Methodsmentioning
confidence: 99%