Hidden Markov model-based speech emotion recognition

Schuller, Björn W.; Rigoll, Gerhard; Lang, M.

doi:10.1109/icassp.2003.1202279

Cited by 246 publications

(191 citation statements)

References 4 publications

Supporting

Mentioning

188

Contrasting

Unclassified

Order By: Relevance

“…Response of the subject is measured using different modalities, like psychophysiological responses (heart rate, blood pressure, skin conductance and temperature), but also through verbal and nonverbal responses. The analysis of vocal features can give important information about the emotional state of a person [11]. Emotions can be expressed non-verbally, through prosodic structure of an utterance, but also verbally, by directly expressing thoughts and feelings.…”

Section: Introductionmentioning

confidence: 99%

Development of Acoustic Model for Croatian Language Using HTK

Dropuljić¹,

Petrinovic

2010

Automatika

View full text Add to dashboard Cite

Original scientific paperPaper presents development of the acoustic model for Croatian language for automatic speech recognition (ASR). Continuous speech recognition is performed by means of the Hidden Markov Models (HMM) implemented in the HMM Toolkit (HTK). In order to adjust the HTK to the native language a novel algorithm for Croatian language transcription (CLT) has been developed. It is based on phonetic assimilation rules that are applied within uttered words. Phonetic questions for state tying of different triphone models have also been developed. The automated system for training and evaluation of acoustic models has been developed and integrated with the new graphical user interface (GUI). Targeted applications of this ASR system are stress inoculation training (SIT) and virtual reality exposure therapy (VRET). Adaptability of the model to a closed set of speakers is important for such applications and this paper investigates the applicability of the HTK tool for typical scenarios. Robustness of the tool to a new language was tested in matched conditions by a parallel training of an English model that was used as a baseline. Ten native Croatian speakers participated in experiments. Encouraging results were achieved and reported with the developed model for Croatian language.Key words: Acoustic model, Automatic speech recognition, Croatian language, Hidden Markov models, Phonetic assimilation, Phonetic transcription algorithm, Recognition accuracy Razvoj akustičkog modela hrvatskog jezika pomoću alata HTK. Rad opisuje razvoj akustičkog modela hrvatskog jezika za potrebe sustava za automatsko prepoznavanje govora. Prepoznavanje prirodnog spojenog izgovora ostvaruje se korištenjem skrivenih Markovljevih modela (HMM) u okviru alata HTK. U svrhu prilagodbe ovog alata na hrvatski jezik razvijen je novi algoritam za automatsku fonetsku transkripciju hrvatskih riječi. Zasniva se na načelu fonetske asimilacije unutar izgovorenih riječi. Razvijen je i skup fonetskih pitanja koji se koristi za klasifikaciju prilikom udruživanja trifonskih modela sličnih glasova. Razvijena je automatizirana aplikacija za gradnju i evaluaciju akustičkih modela, integrirana s novo razvijenim grafičkim sučeljem. Primjene ovog sustava za prepoznavanje su trening s doziranim izlaganjem stresu (SIT) i terapija izlaganjem primjenom virtualne stvarnosti (VRET). Prilagodljivost akustičkog modela na zatvoren skup govornika vrlo je važna za takve primjene, pa se u radu istražuje primjenjivost alata HTK u tipičnim scenarijima. Robusnost alata na promjenu jezika istražuje se uparenim treniranjem i evaluacijom ekvivalentnog modela engleskog jezika u jednakim uvjetima. U eksperimentima je sudjelovalo deset izvornih hrvatskih govornika. Ostvareni rezultati za hrvatski jezik prikazani u radu pokazuju zadovoljavajuća svojstva razvijenog akustičkog modela hrvatskog jezika.Ključne riječi: akustički model, automatsko prepoznavanje govora, hrvatski jezik, skriveni Markovljevi modeli, algoritam za fonetsku transkripciju, fonetska asimilacija, točnost pr...

show abstract

Section: Introductionmentioning

confidence: 99%

Development of Acoustic Model for Croatian Language Using HTK

Dropuljić¹,

Petrinovic

2010

Automatika

View full text Add to dashboard Cite

show abstract

“…Although utterance level approaches are the most common (Schuller et al, 2005;Cichosz and Slot, 2005;Oudeyer, 2003), segment based approaches are becoming more popular. Segment based approaches try to model the shape of acoustic contours more closely as in (Katz et al, 1996;Schuller et al, 2003;Batliner et al, 2003;Batliner et al, 2005;Rotaru and Litman, 2005). In all of the mentioned studies, a single speech corpus is used for training and testing a machine learned classifier.…”

Section: Introductionmentioning

confidence: 99%

An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Shami

Verhelst

2007

Speech Communication

110

View full text Add to dashboard Cite

In this study, the robustness of approaches to the automatic classification of emotions in speech is addressed. Among the many types of emotions that exist, two groups of emotions are considered, adult-to-adult acted vocal expressions of common types of emotions like happiness, sadness, and anger and adult-to-infant vocal expressions of affective intents also known as "motherese". Specifically, we estimate the generalization capability of two feature extraction approaches, the approach developed for Sony's robotic dog AIBO (AIBO) and the segment-based approach (SBA) of . Three machine learning approaches are considered, K-Nearest Neighbors (KNN), Support Vector Machines (SVM) and Adaboosted decision trees and four emotional speech databases are employed, Kismet, BabyEars, Danish, and Berlin databases.Single corpus experiments show that the considered feature extraction approaches AIBO and SBA are competitive on the four databases considered and that their performance is comparable with previously published results on the same databases. The best choice of machine learning algorithm seems to depend on the feature extraction approach considered.Multi corpus experiments are performed with the Kismet-BabyEars and the Danish-Berlin database pairs that contain parallel emotional classes. Automatic clustering of the emotional classes in the database pairs shows that the patterns behind the emotions in the Kismet-BabyEars pair are less database dependent than the patterns in the Danish-Berlin pair. In off-corpus testing the classifier is trained on one database of a pair and tested on the other. This provides little improvement over baseline classification. In integrated corpus testing, however, the classifier is machine learned on the merged databases and this gives promisingly robust classification results, which suggest that emotional corpora with parallel emotion classes recorded under different conditions can be used to construct a single classifier capable of distinguishing the emotions in the merged corpora. Such a classifier is more robust than a classifier learned on a single corpus as it can recognize more varied expressions of the same emotional classes. These findings suggest that the existing approaches for the classification of emotions in speech are efficient enough to handle larger amounts of training data without any reduction in classification accuracy.

show abstract

“…This approach employs machine learning techniques such as Hidden Markov Models [13]. Speech and speaker recognition techniques: short-term features and statistical modeling (GMM, HMM) have been successfully combined with a traditional turn based level approach [15].…”

Section: Machine Learning Based Unitsmentioning

confidence: 99%

“…Indeed, the standard unit is the speaker turn level [12][13][14] which consists in the characterization of a whole sentence by a large number of features. This approach assumes that the emotional state is not changing during the speaker turn level.…”

Section: Units For Emotional Speech Characterizationmentioning

confidence: 99%

Time-Scale Feature Extractions for Emotional Speech Characterization

2009

View full text Add to dashboard Cite

Emotional speech characterization is an important issue for the understanding of interaction. This article discusses the time-scale analysis problem in feature extraction for emotional speech processing. We describe a computational framework for combining segmental and supra-segmental features for emotional speech detection. The statistical fusion is based on the estimation of local a posteriori class probabilities and the overall decision employs weighting factors directly related to the duration of the individual speech segments. This strategy is applied to a real-world application: detection of Italian motherese in authentic and longitudinal parent-infant interaction at home. The results suggest that short-and long-term information, respectively, represented by the short-term spectrum and the prosody parameters (fundamental frequency and energy) provide a robust and efficient time-scale analysis. A similar fusion methodology is also investigated by the use of a phonetic-specific characterization process. This strategy is motivated by the fact that there are variations across emotional states at the phoneme level. A timescale based on both vowels and consonants is proposed and it provides a relevant and discriminant feature space for acted emotion recognition. The experimental results on two different databases Berlin (German) and Aholab (Basque) show that the best performance are obtained by our phoneme-dependent approach. These findings demonstrate the relevance of taking into account phoneme dependency (vowels/consonants) for emotional speech characterization.

show abstract

Hidden Markov model-based speech emotion recognition

Cited by 246 publications

References 4 publications

Development of Acoustic Model for Croatian Language Using HTK

Development of Acoustic Model for Croatian Language Using HTK

An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Time-Scale Feature Extractions for Emotional Speech Characterization

Contact Info

Product

Resources

About