2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2014
DOI: 10.1109/icassp.2014.6854810
|View full text |Cite
|
Sign up to set email alerts
|

A fixed dimension and perceptually based dynamic sinusoidal model of speech

Abstract: This paper presents a fixed-and low-dimensional, perceptually based dynamic sinusoidal model of speech referred to as PDM (Perceptual Dynamic Model). To decrease and fix the number of sinusoidal components typically used in the standard sinusoidal model, we propose to use only one dynamic sinusoidal component per critical band. For each band, the sinusoid with the maximum spectral amplitude is selected and associated with the centre frequency of that critical band. The model is expanded at low frequencies by i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
10
0

Year Published

2014
2014
2016
2016

Publication Types

Select...
5

Relationship

3
2

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 11 publications
(16 reference statements)
0
10
0
Order By: Relevance
“…For the DIR method, we model log|A| and log|B| explicitly. For this, we proposed in [11] synthesis, HDM is used for generating speech, where amplitudes at each harmonic (|A HDM | , |B HDM |) are assigned the amplitude of the centre frequency of the critical band in which they lie. Figure 1 gives an overview of both methods for integrating the DSM into DNN-based speech synthesis (see [12] for more detail).…”
Section: Methods For Dsm Parameterisationmentioning
confidence: 99%
“…For the DIR method, we model log|A| and log|B| explicitly. For this, we proposed in [11] synthesis, HDM is used for generating speech, where amplitudes at each harmonic (|A HDM | , |B HDM |) are assigned the amplitude of the centre frequency of the critical band in which they lie. Figure 1 gives an overview of both methods for integrating the DSM into DNN-based speech synthesis (see [12] for more detail).…”
Section: Methods For Dsm Parameterisationmentioning
confidence: 99%
“…number of sinusoids) is higher than typical source-filter ones and varies from frame to frame. To address this problem, a perceptual dynamic sinusoidal model (PDM) [17] has been proposed to generate high quality speech with a fixed and low number of parameters.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, [17] has shown that incorporating the dynamic slope of sinusoids can greatly improve quality in copy synthesis. It is natural, therefore, to consider including this dynamic feature for statistical modelling too.…”
Section: Introductionmentioning
confidence: 99%
“…For spectral features, either i) 50 regularized discrete cepstra (RDC) extracted from the amplitudes of the harmonic dynamic model (HDM) [24] or ii) 50 highly correlated log amplitudes from the perceptual dynamic sinusoidal model (PDM) [25] are used as real-valued spectral output. 50 complex amplitudes with minimum phase extracted from PDM [19] are applied as complex-valued spectral output. Continuous logF 0 and a voiced/unvoiced (vuv) binary value together with either type of these spectral features are used to represent output features (total dimensions: 52).…”
Section: System Configurationmentioning
confidence: 99%
“…This is motivated by the fact that for real-valued classification tasks, a CVNN has the same performance as a real-valued NN with a larger number of neurons [18]. Note that speech synthesis is a regression task, which is different from tasks reported in the literature; iii) Complex amplitudes extracted from [19] can be used as complex-valued outputs where phase is composed of linear phase, minimum phase and disperse phase. Here, linear phase should be omitted in the calculation of the amplitudephase objective function since analysis window position is unrelated to linguistic input.…”
Section: Introductionmentioning
confidence: 99%