2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639003
|View full text |Cite
|
Sign up to set email alerts
|

Transmutative voice conversion

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 24 publications
0
4
0
Order By: Relevance
“…We evaluate the two approaches by imposing the F0 contours generated by the two approaches onto recorded natural speech, thereby ensuring that the comparison strictly focused on the quality of the F0 contours and is not affected by other aspects of the synthesis process [27]. To ensure that the F0 contours are properly aligned with the phonetic segment boundaries of the natural utterance, the contours are time warped so that the predicted phonetic segment boundaries correspond to the segment boundaries of the natural utterance.…”
Section: Discussionmentioning
confidence: 99%
“…We evaluate the two approaches by imposing the F0 contours generated by the two approaches onto recorded natural speech, thereby ensuring that the comparison strictly focused on the quality of the F0 contours and is not affected by other aspects of the synthesis process [27]. To ensure that the F0 contours are properly aligned with the phonetic segment boundaries of the natural utterance, the contours are time warped so that the predicted phonetic segment boundaries correspond to the segment boundaries of the natural utterance.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, the statistical averaging effect, which reflects the central tendency of speech features, could introduce oversmoothing [24,34,35]. Frequency warping methods take the physical principles into consideration and aim to warp the frequency axis of the amplitude spectrum to the source speaker to match that of the target speaker [36][37][38][39][40][41]. In this way, the frequency warping methods are able to keep more spectral details and produce high-quality converted speech.…”
Section: Spectral Mappingmentioning
confidence: 99%
“…The second type of FW method defines the warping function by a sequence of aligned frequency axis pairs. Dynamic frequency warping (DFW) technique was proposed in [12,13,14] to minimize the spectral distance between the source and target spectra. This method operates on the high-dimensional spectral feature directly and is able to achieve low spectral distortion.…”
Section: Introductionmentioning
confidence: 99%
“…However, the conversion quality is moderate because the slopes of spectra are not considered. In [15,16,14], lowdimensional spectral features representing the formant positions, were used to train the FW functions. A combination of statistical method and FW method was proposed in [17].…”
Section: Introductionmentioning
confidence: 99%