Time-domain isolated phoneme classification using reconstructed phase spaces

Johnson, Michael T.; Povinelli, Richard J.; Lindgren, A.C.; Ye, Jinjin; Liu, Xiaolin; Indrebo, Kevin M.

doi:10.1109/tsa.2005.848885

Cited by 43 publications

(32 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…This is because the time complexity of the Viterbi algorithm [1], which is used for the recognition of speech, is far greater for the RPS approach, due to the amount of data [30]. For RPS based methods to become useful, this issue must be solved.…”

Section: Applications To Asrmentioning

confidence: 99%

“…Again, GMMs are used to model the RPS features, and are learned using binarysplit EM. The number of mixtures used, which was determined empirically in [30], is 128. The classification accuracy for an RPS of dimension 10, with delta dimensions is 38.81%.…”

Section: Fullband Rpsmentioning

confidence: 99%

“…Since each point in the RPS is treated as a frame, with likelihoods calculated for each model every 6.25 us for a sampling rate of 16,000 Hz, there are many more frames in this approach than in the standard spectral feature-based methodology. This leads to a recognition time complexity that is on the order of 100 times greater than automatic speech recognition systems based on spectral features [30].…”

Section: Future Directionsmentioning

confidence: 99%

See 2 more Smart Citations

Sub-banded reconstructed phase spaces for speech recognition

Indrebo

Povinelli

Johnson

2006

Speech Communication

View full text Add to dashboard Cite

ii Preface A novel method for classification of speech phonemes, based on the combination of dynamical systems theory and filter banks, is introduced. The benefit of this approach is seen in its ability to model nonlinear characteristics of speech, something that traditional methods cannot do. The modeling tool that provides this capability is the reconstructed phase space. This space carries all the dynamical information present in the signal's underlying system. The reconstructed phase spaces used for modeling and classification of the phonemes are built using frequency sub-banded signals that are generated using a set of band-pass filters. This approach is motivated by empirical evidence that suggests humans process and recognize speech in sub-bands. Modeling and classification is performed on the sub-banded reconstructed phase spaces using Gaussian Mixture Models, and the results of the classifications for each sub-band are combined to form an overall classification. Several methods for the combination of the sub-band classifications are examined, and it is found that an un-weighted linear combination produces classification accuracies that are significantly higher than those of a classification system using reconstructed phase spaces of unfiltered signals. Results also demonstrate that the proposed phoneme classification system is competitive with state-of-the-art approaches.iii

show abstract

Section: Applications To Asrmentioning

confidence: 99%

Section: Fullband Rpsmentioning

confidence: 99%

Section: Future Directionsmentioning

confidence: 99%

See 1 more Smart Citation

Sub-banded reconstructed phase spaces for speech recognition

Indrebo

Povinelli

Johnson

2006

Speech Communication

View full text Add to dashboard Cite

show abstract

“…Recentes pesquisas relacionadas às séries temporais, geradas a partir dos mecanismos de produção da voz humana, têm sido realizadas considerando-se as técnicas da dinâmica não linear e da teoria do caos com objetivos variados, dentre os quais podem ser destacados: classificação de fonemas (Johnson et al, 2005;Kokkinos e Maragos, 2005), reconhecimento automático de locutor (Petry, 2002), discriminação entre vozes saudáveis e patológicas, diagnóstico de patologias laríngeas e avaliação de efeitos de tratamentos clínicos (Dajer, 2006;Henríquez et al, 2009;Jiang et al, 2006;Scalassara et al, 2008;Torres et al, 2003;Zhang e Jiang, 2008).…”

Section: Introductionunclassified

Classificação de sinais de vozes saudáveis e patológicas por meio da combinação entre medidas da análise dinâmica não linear e codificação preditiva linear

Costa¹,

Costa²,

Assis³

et al. 2013

RBEB

View full text Add to dashboard Cite

“…The anal-4 ysis may be followed by measurement of invariant quantities on the reconstructed space. Early works in the field employing phase-space reconstruction include [21,22,26,27,17,18,28], whereas recently there has been increasing interest in the area [19,29,30]. These employ concepts on Lyapunov exponents [18,19,29], density models of the phase-space [30], correlation dimension measurements [18,28], especially for fricative consonants [17], or surrogate analysis on the nonlinear dynamics of vowels [31].…”

Section: Introductionmentioning

confidence: 99%

Analysis and classification of speech signals by generalized fractal dimension features

Pitsikalis

Maragos

2009

Speech Communication

View full text Add to dashboard Cite

We explore nonlinear signal processing methods inspired by dynamical systems and fractal theory in order to analyze and characterize speech sounds. A speech signal is at first embedded in a multidimensional phase-space and further employed for the estimation of measurements related to the fractal dimensions. Our goals are to compute these raw measurements in the practical cases of speech signals, to further utilize them for the extraction of simple descriptive features and to address issues on the efficacy of the proposed features to characterize speech sounds. We observe that distinct feature vector elements obtain values or show statistical trends that on average depend on general characteristics such as the voicing, the manner and the place of articulation of broad phoneme classes. Moreover the way that the statistical parameters of the features are altered as an effect of the variation of phonetic characteristics seem to follow some roughly formed patterns. We also discuss some qualitative aspects concerning the linear phoneme-wise correlation between the fractal features and the commonly employed mel-frequency cepstral coefficients (MFCC) demonstrating phonetic cases of maximal and minimal correlation. In the same context we also investigate the fractal features' spectral content, in terms of the most and least correlated components with the MFCC.

show abstract

Time-domain isolated phoneme classification using reconstructed phase spaces

Cited by 43 publications

References 27 publications

Sub-banded reconstructed phase spaces for speech recognition

Sub-banded reconstructed phase spaces for speech recognition

Classificação de sinais de vozes saudáveis e patológicas por meio da combinação entre medidas da análise dinâmica não linear e codificação preditiva linear

Analysis and classification of speech signals by generalized fractal dimension features

Contact Info

Product

Resources

About