Abstract. Although an increasing amount of research has been carried out into human-machine interaction in the last century, even today we are not able to fully understand the dynamic changes in human interaction. Only when we achieve this, will we be able to go beyond a one-to-one mapping between text and speech and be able to add social information to speech technologies. Social information is expressed to a high degree through prosodic cues and movement of the body and the face. The aim of this paper is to use those cues to make one aspect of social information more tangible; namely participants' degree of involvement in a conversation. Our results for voice span and intensity, and our preliminary results on the movement of the body and face suggest that these cues are reliable cues for the detection of distinct levels of participants involvement in conversation. This will allow for the development of a statistical model which is able to classify these stages of involvement. Our data indicate that involvement may be a scalar phenomenon.
Previous work has shown that read and spontaneous monologues differ prosodically both in production and perception. In this paper, we examine whether similar effects can be found between spontaneous and read, or rather acted, dialogues. It is possible that speakers can mimic conversational prosody very well. Alternatively, they might use prosodic resources more than the conversational situation actually requires (overacting). Another possibility is that in acted dialogues, prosody is actually used less as a communicative device, as there is no need to establish a common ground or to organize the floor between interlocutors. In our study, we examined spontaneous and read dialogues of equal verbal content. The task-oriented dialogues contained a communicative situation implicitly demanding for for a higher speaking rate (time pressure). Our results show that globally, speakers met this conversational demand of increased speaking rate both in the acted and in the spontaneous situation, although we find different global speaking rates between the spontaneous and the acted condition. Also, read speech exhibits a lower F0 minimum and, consequently, a larger F0 range than read speech, which may be explicable by a lack of active turn taking organization. Summing up, acted conversational prosody resembles many features of spontaneous interaction, but also shows systematic differences.
We show how our optimization-based model of speech timing reproduces three effects of prosodic prominence on suprasegmental timing patterns in speech: (1), the durational interaction between lexical stress and pitch accent, (2), polysyllabic shortening in pitch-accented words and (3), differential behavior of prominent and non-prominent syllables under speaking rate variation. We review the literature and present model simulations that replicate reported phenomena. Results underline the capacity of our model to provide a unified account of the temporal organization of speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.