2018
DOI: 10.1109/taslp.2018.2828980
|View full text |Cite
|
Sign up to set email alerts
|

Speech Enhancement of Noisy and Reverberant Speech for Text-to-Speech

Abstract: Abstract-Text-to-speech voices created from noisy and reverberant recordings are of lower quality. A simple way to improve this is to increase the quality of the recordings prior to textto-speech training with speech enhancement methods such as noise suppression and dereverberation. In this paper we opted for this approach and to perform the enhancement we used a recurrent neural network. The network is trained with parallel data of clean and lower quality recordings of speech. The lower quality data was artif… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
18
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 39 publications
(18 citation statements)
references
References 36 publications
(55 reference statements)
0
18
0
Order By: Relevance
“…A multi-speaker reverberant speech database 3 [29] was used in our experiments. From the database, we used a reverberant subset of 28 speakers that contained 11,572 utterances and 18 reverberation types (9 rooms × 2 microphones positions).…”
Section: Data and Feature Configurationmentioning
confidence: 99%
See 1 more Smart Citation
“…A multi-speaker reverberant speech database 3 [29] was used in our experiments. From the database, we used a reverberant subset of 28 speakers that contained 11,572 utterances and 18 reverberation types (9 rooms × 2 microphones positions).…”
Section: Data and Feature Configurationmentioning
confidence: 99%
“…We used an open source toolkit [33] to blindly estimate T60 from the reverberant speech. The T60 estimation errors were calculated as the difference between the estimated T60 and the ground-truth T60 (T60n) reported in the database paper [29].…”
Section: Objective Evaluation -T60 Comparisons -T60 Estimation Errorsmentioning
confidence: 99%
“…However, their success is limited in mid SNR values. Botinhao et al [8] proposed recently an SE technique for noise robust speech synthesis based on recurrent networks. However, this technique operates in feature domain instead of waveform domain resulting in the implicit introduction of vocoding quality in the enhanced speech.…”
Section: Introductionmentioning
confidence: 99%
“…The biggest challenge in building personalized TTS systems is to obtain a high quality training corpus from a particular voice to either build a speaker-dependent model or a speaker-adapted model using a pre-trained base model [4,5]. In any case, the quality of synthetic voices is highly affected by the presence of noise and reverberation in the training corpus [6,7]. One alternative is to identify and discard corrupted data, but this solution is only feasible when a large amount of training data is available, which is not the typical case in TTS personalization [8].…”
Section: Introductionmentioning
confidence: 99%
“…However, there are not many studies about the effects of noise, reverberation, and the application of speech enhancement techniques for TTS. The most detailed study we found in the literature is [7], in which the authors evaluate the effects of noise and reverberation on a speaker-adapted TTS system and propose a TF masking method based on a Deep-Neural Network (DNN) to enhance the training data. The objective of this paper is to perform a thorough assessment of how noise and reverberation affect the different statistical models that compose the TTS system or are involved in its training: the Forced-Aligner (FA), the Acoustic Model (AM), and the Duration Model (DM).…”
Section: Introductionmentioning
confidence: 99%