Effectiveness of multiscale fractal dimension-based phonetic segmentation in speech synthesis for low resource language

Zaki, Mohammadi; Shah, Jui; Patil, Hemant A.

doi:10.1109/ialp.2014.6973508

Cited by 5 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For text-dependent VC, first task is to align spectral features extracted from the source and target speakers' parallel utterances. It has been proved experimentally that alignment accuracy will impact the quality of speech in speech synthesis [3], [4] as well as in VC [5]. In the case of parallel data, Dynamic Time Warping (DTW) algorithm is used for alignment.…”

Section: Introductionmentioning

confidence: 99%

Novel Pre-processing using Outlier Removal in Voice Conversion

Rao¹,

Shah²,

Patil³

2016

9th ISCA Workshop on Speech Synthesis Workshop (SSW 9)

Self Cite

View full text Add to dashboard Cite

Voice conversion (VC) technique modifies the speech utterance spoken by a source speaker to make it sound like a target speaker is speaking. Gaussian Mixture Model (GMM)-based VC is a state-of-the-art method. It finds the mapping function by modeling the joint density of source and target speakers using GMM to convert spectral features framewise. As with any real dataset, the spectral parameters contain a few points that are inconsistent with the rest of the data, called outliers. Until now, there has been very few literature regarding the effect of outliers in voice conversion. In this paper, we have explored the effect of outliers in voice conversion, as a pre-processing step. In order to remove these outliers, we have used the score distance, which uses the scores estimated using Robust Principal Component Analysis (ROBPCA). The outliers are determined by using a cutoff value based on the degrees of freedom in a chi-squared distribution. They are then removed from the training dataset and a GMM is trained based on the least outlying points. This pre-processing step can be applied to various methods. Experimental results indicate that there is a clear improvement in both, the objective (8 %) as well as the subjective (4 % for MOS and 5 % for XAB) results.

show abstract

Section: Introductionmentioning

confidence: 99%

Novel Pre-processing using Outlier Removal in Voice Conversion

Rao¹,

Shah²,

Patil³

2016

9th ISCA Workshop on Speech Synthesis Workshop (SSW 9)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Although this model presented some brilliant results regarding speech or speaker recognition techniques [6], [7], [8], [9]. However, its well known that some phenomena can not be captured by this model [10]. The speech instability and turbulence and other fluctuated and nonlinear open and close cycles in larynx all these phenomena can not be estimated well be the traditional source-filer model.…”

Section: Am-fm Modulation Featurementioning

confidence: 99%

Modulation Components and Genetic Algorithm for Speaker Recognition System

Hassan¹,

Ajel²,

Ibrahim³

2017

ijacsa

View full text Add to dashboard Cite

Abstract-In this paper, the aim is to investigate weather or not that changing the filter-bank components (of the speaker recognition system) could improve the system performance in identifying the speaker. The filter is composed of 30 Gamatone filter channels. First, the channels are mel distributed of the frequency line. Then the componentsvalues (center frequencies and bandwidths) changes with each run. Genetic algorithm (GA) is adopted to improve the filter component values that, in a result, improve the system performance. At each GA run, a new set of filter components will be generated that aimed to improve the performance comparing with the previous run. This will continue until the system reach to the maximum accuracy or the GA reach to its limits. Results show that the system will be improved at each run, however, different words might response differently to the system filter changing. Also, in terms of additive noise, the results show that although the digits affected differently by the noise, the system still get improving with reach GA run.

show abstract

Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

Shah

Patil

2017

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Effectiveness of multiscale fractal dimension-based phonetic segmentation in speech synthesis for low resource language

Cited by 5 publications

References 27 publications

Novel Pre-processing using Outlier Removal in Voice Conversion

Novel Pre-processing using Outlier Removal in Voice Conversion

Modulation Components and Genetic Algorithm for Speaker Recognition System

Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

Contact Info

Product

Resources

About