Data-driven voice soruce waveform modelling

Thomas, Mark R.; Guðnason, Jón; Naylor, Patrick A.

doi:10.1109/icassp.2009.4960496

Cited by 21 publications

(16 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In speech analysis nomenclature, these timing instants are called glottal closure instants (GCIs) and glottal opening instants (GOIs). Applications of GCI and GOI estimation are numerous, including pitch tracking [1], [2], voice source modeling [3]- [6], speech enhancement [7], closedphase analysis and glottal flow estimation [8]- [11], speaker identification [9], [12], [13], speech dereverberation [14], speech synthesis [15], [16], speech coding [17], speech modification [18], [19] and speech transformations [20].…”

Section: Introductionmentioning

confidence: 99%

A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

Koutrouvelis

Kafentzis

Gaubitch

et al. 2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-We propose a fast speech analysis method which simultaneously performs high-resolution voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal opening instants (GCIs and GOIs, respectively). The proposed algorithm exploits the structure of the glottal flow derivative in order to estimate GCIs and GOIs only in voiced speech using simple time-domain criteria. We compare our method with well-known GCI/GOI methods, namely, the dynamic programming projected phase-slope algorithm (DYPSA), the yet another GCI/GOI algorithm (YAGA) and the speech event detection using the residual excitation and a mean-based signal (SEDREAMS). Furthermore, we examine the performance of the aforementioned methods when combined with state-of-the-art VUD algorithms, namely, the robust algorithm for pitch tracking (RAPT) and the summation of residual harmonics (SRH). Experiments conducted on the APLAWD and SAM databases show that the proposed algorithm outperforms the state-of-the-art combinations of VUD and GCI/GOI algorithms with respect to almost all evaluation criteria for clean speech. Experiments on speech contaminated with several noise types (white Gaussian, babble, and car-interior) are also presented and discussed. The proposed algorithm outperforms the state-of-the-art combinations in most evaluation criteria for signal-to-noise ratio greater than 10 dB.Index Terms-Glottal closure instants (GCIs), glottal opening instants (GOIs), pitch estimation, speech analysis, voiced/unvoiced detection (VUD).

show abstract

Section: Introductionmentioning

confidence: 99%

A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

Koutrouvelis

Kafentzis

Gaubitch

et al. 2016

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…For this reason, existing ABWE approaches often avoid lowband extension altogether. This paper presents a novel method for the extension of narrowband source signals based on an existing spectral mirroring technique and Data-Driven Voice Source Modelling (DDVSM) [7], employing GMMs to establish an explicit mapping between narrowband source features and the wideband source signal. Using an existing ABWE framework [5] that applies HMM-based Bayesian estimation of spectral and temporal envelopes [1], missing frequency content in both high and low bands is synthesized and added to the narrowband signal to form an estimated wideband signal.…”

Section: Introductionmentioning

confidence: 99%

“…Data-Driven Voice Source Modelling (DDVSM) [7] is a technique for classifying voice source signals. One such implementation uses a large database of training data to estimate class distributions in the MFCC feature space, from which a set of corresponding 'prototype' time-domain waveforms are derived.…”

Section: Introduction To Data-driven Voice Source Modellingmentioning

confidence: 99%

Voice source estimation for artificial bandwidth extension of telephone speech

Thomas

Guðnason

Naylor

et al. 2010

2010 IEEE International Conference on Acoustics, Speech and Signal Processing

Self Cite

View full text Add to dashboard Cite

Artificial bandwidth extension (ABWE) of speech signals aims to estimate wideband speech (50 Hz -7 kHz) from narrowband signals (300 Hz -3.4 kHz). Applying the source-filter model of speech, many existing algorithms estimate vocal tract filter parameters independently of the source signal. However, many current methods for extending the narrowband voice source signal are limited to straightforward signal processing techniques which are only effective for high-band estimation. This paper presents a method for ABWE that employs novel data-driven modelling and an existing spectral mirroring technique to estimate the wideband source signal in both the high and low extension bands. A state-of-the-art Hidden Markov Model-based estimator evaluates the temporal and spectral envelopes in the missing frequency bands, with which the ABWE speech signal is synthesized. Informal listening tests comparing two existing source estimation techniques and two permutations of the proposed approach show an improvement in the perceived bandwidth of speech signals, in particular towards low frequencies. Subjective tests on the same data show a preference for the proposed techniques over the existing methods under test.

show abstract

“…Other techniques jointly estimate and [30] that are not considered here. Re-writing (1) in the time domain (3) where are the prediction coefficients, is an estimate of , and is the prediction order. The vocal tract transfer function can be approximated as (4) The prediction order for an adult male of vocal tract length 17 cm is approximately , where is the sampling frequency.…”

Section: B Estimation By Linear Predictionmentioning

confidence: 99%

“…The detection of GCIs in voiced speech is important for glottal-synchronous speech processing algorithms such as pitch tracking, prosodic speech modification [1], speech dereverberation [2], data-driven voice source modeling [3] and areas of speech synthesis [4]. Identification of GOIs is necessary for closed-phase linear predictive coding (LPC) [5] and the analysis of pathological speech that relies upon knowledge of the open quotient (OQ) [6].…”

mentioning

confidence: 99%

Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm

Thomas

Guðnason

Naylor

2012

IEEE Trans. Audio Speech Lang. Process.

Self Cite

105

View full text Add to dashboard Cite

Abstract-Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important for speech processing applications that benefit from glottal-synchronous processing including pitch tracking, prosodic speech modification, speech dereverberation, synthesis and study of pathological voice. We propose the Yet Another GCI/GOI Algorithm (YAGA) to detect GCIs from speech signals by employing multiscale analysis, the group delay function, and -best dynamic programming. A novel GOI detector based upon the consistency of the candidates' closed quotients relative to the estimated GCIs is also presented. Particular attention is paid to the precise definition of the glottal closed phase, which we define as the analysis interval that produces minimum deviation from an all-pole model of the speech signal with closedphase linear prediction (LP). A reference algorithm analyzing both electroglottograph (EGG) and speech signals is described for evaluation of the proposed speech-based algorithm. In addition to the development of a GCI/GOI detector, an important outcome of this work is in demonstrating that GOIs derived from the EGG signal are not necessarily well-suited to closed-phase LP analysis. Evaluation of YAGA against the APLAWD and SAM databases show that GCI identification rates of up to 99.3% can be achieved with an accuracy of 0.3 ms and GOI detection can be achieved equally reliably with an accuracy of 0.5 ms.Index Terms-Dynamic programming, electroglottograph (EGG), glottal closing instants (GCIs), glottal opening instants (GOIs), group delay function, multiscale analysis, speech processing.

show abstract

Data-driven voice soruce waveform modelling

Cited by 21 publications

References 15 publications

A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

A Fast Method for High-Resolution Voiced/Unvoiced Detection and Glottal Closure/Opening Instant Estimation of Speech

Voice source estimation for artificial bandwidth extension of telephone speech

Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm

Contact Info

Product

Resources

About