2017 25th European Signal Processing Conference (EUSIPCO) 2017
DOI: 10.23919/eusipco.2017.8081516
|View full text |Cite
|
Sign up to set email alerts
|

A variational EM method for pole-zero modeling of speech with mixed block sparse and Gaussian excitation

Abstract: Abstract-The modeling of speech can be used for speech synthesis and speech recognition. We present a speech analysis method based on pole-zero modeling of speech with mixed block sparse and Gaussian excitation. By using a pole-zero model, instead of the all-pole model, a better spectral fitting can be expected. Moreover, motivated by the block sparse glottal flow excitation during voiced speech and the white noise excitation for unvoiced speech, we model the excitation sequence as a combination of block spars… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…For voiced speech, however, the excitation signal does not resemble a white, Gaussian excitation signal as in the autoregressive process, but is much more accurately modelled by an impulse train [3]. As a consequence of this, many alternative ways of estimating the AR-parameters have been proposed based on the prior knowledge on the power spectral density (PSD) [10,11,5] or the excitation signal [3,12,13]. For example, El-Jaroudi and Makhoul proposed in [10] the discrete all-pole (DAP) approach in which the AR-parameters are estimated by minimising the Itakura-Saito (IS) divergence for a discrete set of points, leading to better performance for voiced speech.…”
Section: Introductionmentioning
confidence: 99%
“…For voiced speech, however, the excitation signal does not resemble a white, Gaussian excitation signal as in the autoregressive process, but is much more accurately modelled by an impulse train [3]. As a consequence of this, many alternative ways of estimating the AR-parameters have been proposed based on the prior knowledge on the power spectral density (PSD) [10,11,5] or the excitation signal [3,12,13]. For example, El-Jaroudi and Makhoul proposed in [10] the discrete all-pole (DAP) approach in which the AR-parameters are estimated by minimising the Itakura-Saito (IS) divergence for a discrete set of points, leading to better performance for voiced speech.…”
Section: Introductionmentioning
confidence: 99%