2014 International Conference on Asian Language Processing (IALP) 2014
DOI: 10.1109/ialp.2014.6973508
|View full text |Cite
|
Sign up to set email alerts
|

Effectiveness of multiscale fractal dimension-based phonetic segmentation in speech synthesis for low resource language

Abstract: Phonetic segmentation plays a key role in developing various speech applications. In this work, we propose to use various features for automatic phonetic segmentation task for forced Viterbi alignment and compare their effectiveness. We propose to use novel multiscale fractal dimension-based features concatenated with MelFrequency Cepstral Coefficients (MFCC). The novel features are expected to capture additional nonlinearities in speech production which should improve the performance of segmentation task. How… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 27 publications
0
2
0
Order By: Relevance
“…For text-dependent VC, first task is to align spectral features extracted from the source and target speakers' parallel utterances. It has been proved experimentally that alignment accuracy will impact the quality of speech in speech synthesis [3], [4] as well as in VC [5]. In the case of parallel data, Dynamic Time Warping (DTW) algorithm is used for alignment.…”
Section: Introductionmentioning
confidence: 99%
“…For text-dependent VC, first task is to align spectral features extracted from the source and target speakers' parallel utterances. It has been proved experimentally that alignment accuracy will impact the quality of speech in speech synthesis [3], [4] as well as in VC [5]. In the case of parallel data, Dynamic Time Warping (DTW) algorithm is used for alignment.…”
Section: Introductionmentioning
confidence: 99%
“…Although this model presented some brilliant results regarding speech or speaker recognition techniques [6], [7], [8], [9]. However, its well known that some phenomena can not be captured by this model [10]. The speech instability and turbulence and other fluctuated and nonlinear open and close cycles in larynx all these phenomena can not be estimated well be the traditional source-filer model.…”
Section: Am-fm Modulation Featurementioning
confidence: 99%