This paper reports the development of a quantitative target approximation (qTA) model for generating F(0) contours of speech. The qTA model simulates the production of tone and intonation as a process of syllable-synchronized sequential target approximation [Xu, Y. (2005). "Speech melody as articulatorily implemented communicative functions," Speech Commun. 46, 220-251]. It adopts a set of biomechanical and linguistic assumptions about the mechanisms of speech production. The communicative functions directly modeled are lexical tone in Mandarin and lexical stress in English and focus in both languages. The qTA model is evaluated by extracting function-specific model parameters from natural speech via supervised learning (automatic analysis by synthesis) and comparing the F(0) contours generated with the extracted parameters to those of natural utterances through numerical evaluation and perceptual testing. The F(0) contours generated by the qTA model with the learned parameters were very close to the natural contours in terms of root mean square error, rate of human identification of tone, and focus and judgment of naturalness by human listeners. The results demonstrate that the qTA model is both an effective tool for research on tone and intonation and a potentially effective system for automatic synthesis of tone and intonation.
Our current understanding of how emotions are expressed in speech is still very limited. Part of the difficulty has been the lack of understanding of the underlying mechanisms. Here we report the findings of a somewhat unconventional investigation of emotional speech. Instead of looking for direct acoustic correlates of multiple emotions, we tested a specific theory, the size code hypothesis of emotional speech, about two emotions – anger and happiness. According to the hypothesis, anger and happiness are conveyed in speech by exaggerating or understating the body size of the speaker. In two studies consisting of six experiments, we synthesized vowels with a three-dimensional articulatory synthesizer with parameter manipulations derived from the size code hypothesis, and asked Thai listeners to judge the body size and emotion of the speaker. Vowels synthesized with a longer vocal tract and lower F0 were mostly heard as from a larger person if the length and F0 differences were stationary, but from an angry person if the vocal tract was dynamically lengthened and F0 was dynamically lowered. The opposite was true for the perception of small body size and happiness. These results provide preliminary support for the size code hypothesis. They also point to potential benefits of theory-driven investigations in emotion research.
SaowalukC. WATANAPA•õa), Member, Bundit THIPAKORN•õ•õb), and Nipon CHAROENKITKARN•õc), Nonmembers SUMMARY Effective classification and analysis of semantic contents are very important for the content-based indexing and retrieval of video database. Our research attempts to classify movie clips into three groups of commonly elicited emotions, namely excitement, joy and sadness, based on a set of abstract-level semantic features extracted from the film sequence. In particular, these features consist of six visual and audio measures grounded on the artistic film theories. A unique sieving-structured neural network is proposed to be the classifying model due to its robustness. The performance of the proposed model is tested with 101 movie clips excerpted from 24 award-winning and well-known Hollywood feature films. The experimental result of 97.8% correct classification rate, measured against the collected human judges, indicates the great potential of using abstract-level semantic features as an engineered tool for the application of video-content retrieval/indexing.
This article describes a method of an off-line signature recognition by using hough transform to detect stroke lines from signature image. The hough transform is used to extract the parameterized hough space from signature skeleton as unique characterisitic feature of signatures. In the experiment, the Back Propagation Neural Network is used as a tool to evaluate the performance of the proposed method. The system has been tested with 70 test signatures from different persons. The experimental results reveal the recognition rate 95.24 YO.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.