2012 IEEE 11th International Conference on Signal Processing 2012
DOI: 10.1109/icosp.2012.6491557
|View full text |Cite
|
Sign up to set email alerts
|

Model training using parallel data with mismatched pause positions in statistical esophageal speech enhancement

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 10 publications
0
2
0
Order By: Relevance
“…Before training the LSTM network, it is necessary to align the source and target utterances. Due to the characteristics of the oesophageal speech, there is a very important mismatch between both healthy and oesophageal signals which causes the inadequacy of using a dynamic time warping (DTW) algorithm directly [31]. This is why both signals were labelled at phone level, and then the iterative alignment procedure described in [32] and [33] was applied for each pair of oesophageal and healthy phones.…”
Section: Spectral Conversionmentioning
confidence: 99%
“…Before training the LSTM network, it is necessary to align the source and target utterances. Due to the characteristics of the oesophageal speech, there is a very important mismatch between both healthy and oesophageal signals which causes the inadequacy of using a dynamic time warping (DTW) algorithm directly [31]. This is why both signals were labelled at phone level, and then the iterative alignment procedure described in [32] and [33] was applied for each pair of oesophageal and healthy phones.…”
Section: Spectral Conversionmentioning
confidence: 99%
“…Although non-parallel training is also possible [20], in VC, a set of parallel source-target sentences is desirable. A set of 50 phonemically balanced sentences in Japanese were used to evaluate the performance and capability of the different VC strategies that were aimed at improving the quality and intelligibility of alaryngeal voices [21] [22][23] [24][25] [26] [27].…”
Section: Introductionmentioning
confidence: 99%