2018
DOI: 10.1007/s10772-018-09579-1
|View full text |Cite
|
Sign up to set email alerts
|

Enhancement of esophageal speech obtained by a voice conversion technique using time dilated Fourier cepstra

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 42 publications
0
6
0
Order By: Relevance
“…The time dilated Fourier Cepstra, as defined and detailed in [18], is used to enhance the esophageal speech in the frequency domain, by dilating the frequency axis of ratio 1/α. Thus, the frequency components will be changed, without corrupting the speech signal.…”
Section: The Time Dilated Fourier Cepstra Methodsmentioning
confidence: 99%
“…The time dilated Fourier Cepstra, as defined and detailed in [18], is used to enhance the esophageal speech in the frequency domain, by dilating the frequency axis of ratio 1/α. Thus, the frequency components will be changed, without corrupting the speech signal.…”
Section: The Time Dilated Fourier Cepstra Methodsmentioning
confidence: 99%
“…The source (x n ) and target (y n ) vectors previously aligned by the DTW algorithm are concatenated together into an extended vector z n = [x n , y n ] and then the GMM parameters that model the joint probability density are estimated. • DNN: the DNN-based VC system was implemented based on the approach of [10].…”
Section: Experimental Setupsmentioning
confidence: 99%
“…Due to the extensive use of the esophageal voice by laryngectomees, this type of voice has been the subject of numerous studies in the last few years. To our knowledge, the existing approaches for ES quality improvements can be summarized into three categories: approaches based on the transformation of acoustic features, such as formant synthesis [4], comb filtering [5], and smoothing of acoustic parameters [6]; approaches based on statistical techniques, where [7][8][9] have been carried out, and approaches based on the VC technique, which allows for the transformation of the voice of a source speaker (laryngectomee) into that of a target speaker (laryngeal) [10][11][12][13][14][15][16]. Although these approaches have of course improved the estimation of the acoustic characteristics to reconstruct a converted signal with better quality, the improvements in intelligibility and naturalness are still insufficient.…”
Section: Introductionmentioning
confidence: 99%
“…This conversion function can then be used to convert new OS samples, thereby getting OS speech that has characteristics of HS. In recent times, Deep Neural Networks (DNN) are more popular and effective compared to GMM based methods for enhancement of alaryngeal speech [20][21][22][23] and other types of pathological speech [24,25]. Another attempt to enrich OS was by using the eigenvoices concept [26], which was inspired by the eigenfaces concept [27].…”
Section: Introductionmentioning
confidence: 99%