Abstract. In recent years, various real life applications such as talking books, gadgets and humanoid robots have drawn the attention to pursue research in the area of expressive speech synthesis. Speech synthesis is widely used in various applications. However, there is a growing need for an expressive speech synthesis especially for communication and robotic. In this paper, global and local rule are developed to convert neutral to storytelling style speech for the Malay language. In order to generate rules, modification of prosodic parameters such as pitch, intensity, duration, tempo and pauses are considered. Modification of prosodic parameters is examined by performing prosodic analysis on a story collected from an experienced female and male storyteller. The global and local rule is applied in sentence level and synthesized using HNM. Subjective tests are conducted to evaluate the synthesized storytelling speech quality of both rules based on naturalness, intelligibility, and similarity to the original storytelling speech. The results showed that global rule give a better result than local rule
This paper describes the process undertaken and criteria considered in acquiring a speech corpus of Malay language towards the development of humanoid storyteller. The speech corpus contains 464 speech sentences, 4,656 words and 9,584 syllables. Three children's short stories were recorded by 3 female storytellers, 1 male pr female speakers and 2 male speakers. The equipment specifications, recording procedures and speech annotations are described in detail in accordance to baseline work. The stories were recorded in two speaking styles that are neutral a Malay language storytelling corpus is not only necessary for the development of a storytelling text-to-speech (TTS) synthesis. It is also detrimental for natural language processing and speech recognition of Malay language, an under
This paper presented the implementation of spontaneous speech emotion recognition (SER) using smartphone on iOS platform. The novelty of this work is at the time of writing, no similar work has been done using Malay language spontaneous speech. The development of SER using a mobile device is important for ease of use anytime and anywhere. The main factors to be considered is the computational complexity of classifying the emotions in real-time. Therefore, we introduced EmoLah, a Malay language spontaneous SER that is able to recognize emotions on the go with satisfactory accuracy rate. Pitch and energy prosody features are used to represent the emotions in the spontaneous speech and Naïve Bayes learning model is selected as the classifier. EmoLah is trained and tested using Malay language spontaneous speech acquired from television talk shows, live interviews from news broadcast and mini-parliament sessions conducted by children. Four types of speech emotions are collected that are happy, sad, angry and neutral. The total duration of all the speech emotion is four hours. The speech emotion training is using MATLAB scripts and the weights are implemented in XCODE as the iOS software for application development. Emolah accuracy is evaluated using cross-validation test and the result showed that it can discriminate angry, sad and happy. However, most emotions are misclassified as neutral emotion.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.