Research on speech and emotion is moving from a period of exploratory research into one where there is a prospect of substantial applications, notably in human-computer interaction. Progress in the area relies heavily on the development of appropriate databases. This paper addresses four main issues that need to be considered in developing databases of emotional speech: scope, naturalness, context and descriptors. The state of the art is reviewed. A good deal has been done to address the key issues, but there is still a long way to go. The paper shows how the challenge of developing appropriate databases is being addressed in three major recent projects--the Reading-Leeds project, the Belfast project and the CREST-ESP project. From these and other studies the paper draws together the tools and methods that have been developed, addresses the problems that arise and indicates the future directions for the development of emotional speech databases. Ó 2002 Elsevier Science B.V. All rights reserved.Re esume e LÕe etude de la parole et de lÕe emotion, partie du stade de la recherche exploratrice, en arrive maintenant au stade qui est celui dÕapplications importantes, notamment dans lÕinteraction homme-machine. Le progre es en ce domaine de epend e etroitment du de eveloppement de bases de donne ees approprie ees. Cet article aborde quatre points principaux qui me eritent notre attention a a ce sujet: lÕe etendue, lÕauthenticite e, le contexte et les termes de description. Il pre esente un compte-rendu de la situation actuelle dans ce domaine et e evoque les avance ees faites, et celles qui restent a a faire. LÕarticle montre comment trois re ecents projets importants (celui de Reading-Leeds, celui de Belfast, et celui de CREST-ESP) ont releve e le de efi pose e par la construction de bases de donne ees approprie ees. A partir de ces trois projets, ainsi que dÕautres travaux, les auteurs pre esentment un bilan des outils et me ethodes utilise es, identifient les proble emes qui y sont associe es, et indiquent la direction dans laquelle devraient sÕorienter les recherches a a venir.
Abstract. The HUMAINE project is concerned with developing interfaces that will register and respond to emotion, particularly pervasive emotion (forms of feeling, expression and action that colour most of human life). The HUMAINE Database provides naturalistic clips which record that kind of material, in multiple modalities, and labelling techniques that are suited to describing it.
For many applications of emotion recognition, such as virtual agents, the system must select responses while the user is speaking. This requires reliable on-line recognition of the user's affect. However most emotion recognition systems are based on turnwise processing. We present a novel approach to on-line emotion recognition from speech using Long Short-Term Memory Recurrent Neural Networks. Emotion is recognised frame-wise in a two-dimensional valence-activation continuum. In contrast to current state-of-the-art approaches, recognition is performed on low-level signal frames, similar to those used for speech recognition. No statistical functionals are applied to low-level feature contours. Framing at a higher level is therefore unnecessary and regression outputs can be produced in real-time for every low-level input frame. We also investigate the benefits of including linguistic features on the signal frame level obtained by a keyword spotter.
Statistical methods of describing prosody were used to study fluency, expressiveness and their relationship among 8-10-year-old readers. 67 children were rated on fluency and expressiveness. The two were partially independent in the full sample: expressiveness rarely occurred without fluency, but fluency occurred without expressiveness. A balanced subsample of 24 was selected for closer instrumental and statistical analysis. There were robust relationships between fluency and measures associated with temporal organization.between expressiveness and variables associated with pitch mobility; and Interactions indicated that the relationships were not simple. Differences between groups depended on sentence content and position-expressive readers distinguished sentences more sharply according to content, and the groups diverged on some measures as the passage progressed. Also, measures associated primarily with either fluency or expression often showed secondary sensitivity to the other: temporal organization was associated with fluency, but worsened over time among inexpressive readers; and readers who were both fluent and expressive were distinctive in several respects. Some measures offer a basis for rules aimed at assigning individuals to skill categories, particularly the magnitude of pitch movements and reading time per syllable. The rules distinguish well among readers who were either at one of the extremes of skill, or fluent but inexpressive; it is harder to discriminate among the other readers (who have mixed skill patterns). The effects suggest psychological hypotheses about the underlying mechanisms.
We highlight two broader domains surrounding specific attributions of emotion and the specific features of speech that underlie them, and argue for caution over compartmentalising these broader domains. It seems to be a general rule that variations in what we call the augmented prosodic domain (APD) are emotive -perhaps because they signal departure from a reference point corresponding to a well-controlled, neutral state. Our studies show that various departures from that reference point are reflected in the APD, including central and sensory impairments (schizophrenia and deafness) as well as emotion. Intuitively it seems right to acknowledge that departures from well-controlled neutrality are highly confusable, and it is unclear that phonetics should to try draw those distinctions more sharply than listeners tend to. A system called ASSESS automatically measures properties in the APD, opening the way to explore it in an empirical spirit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.