A pitch extraction algorithm tuned for automatic speech recognition

Ghahremani, Pegah; BabaAli, Bagher; Povey, Daniel; Riedhammer, Korbinian; Trmal, Jan; Khudanpur, Sanjeev

doi:10.1109/icassp.2014.6854049

Cited by 265 publications

(169 citation statements)

References 9 publications

Supporting

Mentioning

162

Contrasting

Unclassified

Order By: Relevance

“…Rillard [25] and d'Allessandro [26] have suggested using the power of the speech signal instead, easing wRMSE calculation. We have opted for the latter, augmenting it with the POV calculated as detailed by Ghahremani [27]. Incorporating the POV in the weighing eliminates the need to hard threshold the POV to obtain voicing, making the whole approach more robust.…”

Section: Intonation Similarity Measuresmentioning

confidence: 99%

See 1 more Smart Citation

Atom decomposition-based intonation modelling

Honnet

Gerazov

Garner

2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Current statistical parametric text-to-speech (TTS) synthesis methods allow production of neutral speech with acceptable quality. However, prosody is often qualified as unsatisfactory and sounding too flat. In this paper, we address intonation modelling for TTS based on physiological aspects of prosody production. A set of gamma distribution shaped atoms is defined and then intonation decomposition is performed using a matching pursuit algorithm. Some preliminary experiments show that this model allows easy extraction of physiologically meaningful atoms that could be used to generate intonation in a TTS system.

show abstract

Section: Intonation Similarity Measuresmentioning

confidence: 99%

“…The Kaldi pitch tracker was used for F0 and probability of voicing (POV) extraction [27]. We used 50ms frame length with 5ms frameshift for extraction.…”

Section: Tools and Settingsmentioning

confidence: 99%

Atom decomposition-based intonation modelling

Honnet

Gerazov

Garner

2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

“…We will use the WCORR norm , in order to assess the perceptual quality of the modelled F 0 using the thresholds discussed in Section 4.2. To extract the continuous F 0 and POV estimates we will use the pitch tracker implemented in Kaldi (Ghahremani et al, 2014) 3 . The second hypothesis is one of comparison of our generalised CR model with a state-of-the-art implementation of the standard CR model.…”

Section: Experiments Designmentioning

confidence: 99%

“…3. We define the weighting function to be (4), where p(i) is the probability of voicing (POV), as defined by Ghahremani et al (2014), and e(i) is the energy contour of the speech signal. This is in accord with newer trends in perceptual intonation studies d'Alessandro et al, 2011).…”

Section: Introductionmentioning

confidence: 99%

Intonation modelling using a muscle model and perceptually weighted matching pursuit

Honnet

Gerazov

Gjoreski

et al. 2018

Speech Communication

View full text Add to dashboard Cite

We propose a physiologically based intonation model using perceptual relevance. Motivated by speech synthesis from a speech-to-speech translation (S2ST) point of view, we aim at a language independent way of modelling intonation. The model presented in this paper can be seen as a generalisation of the command response (CR) model, albeit with the same modelling power. It is an additive model which decomposes intonation contours into a sum of critically damped system impulse responses. To decompose the intonation contour, we use a weighted correlation based atom decomposition algorithm (WCAD) built around a matching pursuit framework. The algorithm allows for an arbitrary precision to be reached using an iterative procedure that adds more elementary atoms to the model. Experiments are presented demonstrating that this generalised CR (GCR) model is able to model intonation as would be expected. Experiments also show that the model produces a similar number of parameters or elements as the CR model. We conclude that the GCR model is appropriate as an engineering solution for modelling prosody, and hope that it is a contribution to a deeper scientific understanding of the neurobiological process of intonation.

show abstract

“…The same Kaldi recipe was used (see https://github.com/ bootphon/abkhazia/blob/master/abkhazia/kaldi/ kaldi templates/train and decode.sh) with the same parameters and input features to train all models. Input features consisted of 13 MFCC coefficients plus 3 pitchrelated features (Ghahremani et al, 2014) and their delta and delta-deltas coefficients. Pitch features were included because tone is contrastive in Mandarin and Vietnamese (i.e.…”

Section: Asr Modelsmentioning

confidence: 99%

ASR Systems as Models of Phonetic Category Perception in Adults

Schatz¹,

Bach²,

Dupoux³

2017

Preprint

View full text Add to dashboard Cite

We test the potential of standard Automatic Speech Recognition (ASR) systems trained on large corpora of continuous speech as quantitative models of human speech processing. In human adults, speech perception is attuned to efficiently process native speech sounds, at the expense of difficulties in processing non-native sounds. We use ABX-discriminability measures to test whether ASR models can account for the patterns of confusion between speech sounds observed in humans. We show that ASR models reproduce some well-documented effects in non-native phonetic perception. Beyond the immediate results, our methodology opens up the possibility of a more systematic investigation of phonetic category perception in humans.

show abstract

A pitch extraction algorithm tuned for automatic speech recognition

Cited by 265 publications

References 9 publications

Atom decomposition-based intonation modelling

Atom decomposition-based intonation modelling

Intonation modelling using a muscle model and perceptually weighted matching pursuit

ASR Systems as Models of Phonetic Category Perception in Adults

Contact Info

Product

Resources

About