Detecting Japanese local speech rate deceleration in spontaneous conversational speech using a variable threshold

Takamaru, Keiichi; Hiroshige, Makoto; Araki, Kenji; Tochinai, Koji

doi:10.21437/eurospeech.2001-182

Cited by 2 publications

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Study on parameters of the variable threshold to detect local speech rate deceleration in Japanese spontaneous conversational speech

Takamaru

Hiroshige

Araki

et al. 2003

Acoust. Sci. & Tech.

View full text Add to dashboard Cite

IntroductionIn human communication, speech conveys not only linguistic information but also emphasis, intention, attitude and so on. They are called paralinguistic information [1]. There are several researches on paralinguistic information [2,3]. Methods for modeling or detecting of paralinguistic information is useful for various application in man-machine communication such as speech synthesis with rich expressions and recognition of paralinguistic information in spontaneous speech. A speaker controls prosodic features such as fundamental frequency, power and temporal structures to express paralinguistic information. It is said that there are few speech rate variations in Japanese read speech. In spontaneous conversational speech, however, a speaker sometimes controls speech rate greatly to obtain a listener's attention. We previously found that speech rate of important words or portions of sentences is slowed to obtain the listener's attention [4]. In order to understand paralinguistic information using a computer, it is one of important issues to detect portions of sentences in which the speaker intentionally decelerates the speech rate. There are several studies on local speech rate variation [5][6][7]. However, there are few studies on detection of local speech rate variation.We try to detect a local slower portion from a time series of mora duration [4,8]. When the speech rate of one portion is slower than that of other portions, the mora duration is longer than the durations of other morae. However, it is known that variation in time series of mora duration is caused not only by intentionally controlled speech rate variation but also by other factors such as difference of phonemes, length of a phrase or a sentence and a position of a mora in a phrase or sentence [9]. We have proposed the variable threshold (VT) [8] for detecting a local slower portion decelerated by a speaker from observed mora duration. The VT is applied to time series of mora duration. A mora whose duration exceeds the VT is detected as a local slower portion. The outline of the VT is described in section 2. In this paper, we examine the properties of parameters in the VT that are used for determining range and speed of variation of the VT. Three sets of parameters are prepared. We assume that these sets of parameters correspond to the levels of a listener's attention to

show abstract