We have developed a new three-dimensional talking robot Waseda Talker No. 6 (WT-6), which generates speech sounds by mechanically simulating articulatory motions as well as aero-acoustic phenomena in the vocal tract. WT-6 has 17-DOF vocal mechanism. It has three-dimensional lips, tongue, jaw and velum which form the 3D vocal tract structure. It also has an independent jaw opening/closing mechanism, which controls the relative tongue position in the vocal tract as well as the oral opening. The previous robot in the series had a 2D tongue and was not able to realize precise closure to produce human-like consonants such as /s/ or /r/. The new tongue, which can be controlled to form 3D shapes, is able to produce more realistic vocal tract shapes. The vocal cord model was also improved by adding a new pitch control mechanism that pushes from the side of the vocal cords. The pitch range is broader than that of the previous robot; it is sufficiently broad so as to be able to reproduce normal human speech. Preliminary experimental results showed improved synthesized speech quality for the vowels /a/, /u/ and /o/.
We have developed talking robot Waseda Talker series, which mimicked human speech production mechanism. This robot is consisted of mechanical vocal cords, tongue, lips, and nasal cavity. The current version, WT-7R (Waseda Talker No. 7 Refined), has 16 DOF (7 DOF in tongue, 1 DOF in jaw, 4 DOF in lips, 1 DOF in velum, 2 DOF in vocal cords, and 1 DOF in lung), having same vocal tract length as average adult male. The mechanical vocal cord model is made of styrene based thermoplastic elastomer “Septon” and the shape is mimicking vocal folds of human with three dimensions. The vocal code model has pitch control mechanism by adjusting the cord tension and glottal opening-closing mechanism, and it could reproduce the vocal cord vibration of the vocal fry (double pitched) and the breathy voice (unclose in one cycle) as well as the modal voice with variable pitch. The three-dimensional tongue model is constructed by the rigid link mechanism covered by the Septon soft rubber. The tongue model could be controlled so as to configure the tongue shape in a three-dimensional way. We will describe the details of the mechanism of talking robot by showing some demonstration videos.
We developed an anthropomorphic talking robot, Waseda Talker No. 6 (WT-6), which
generates speech sounds by mechanically simulating articulatory motions and aero-acoustic
phenomena. WT-6 possesses 17 degrees of freedom (DOF): a 5-DOF tongue, 1-DOF jaws, 4-DOF
lips, a nasal cavity, and a 1-DOF soft palate as articulators; and 5-DOF vocal cords and 1-DOF lungs
as vocal organs. The vocal cords, tongue, and lips are made from the thermoplastic rubber Septon,
whose elasticity is similar to that of human tissue. WT-6 has three-dimensional (3D) lips, tongue, jaw,
and velum, which form the vocal tract structure. It also has an independent jaw opening/closing
mechanism. The previous robot in the series had a two-dimensional tongue and could not produce
human-like tongue shape. The new tongue can form 3D shapes, and thus, is able to produce more
realistic vocal tract shapes. The vocal cord model consists of two folds, and is constructed with a
structure similar to the biomechanical structure of human vocal cords. These vocal cords can vibrate
in complex phases, similar to those of a human. With these mechanisms, the robot can reproduce
human speech in a more biomechanical manner, and thus, can produce a voice closer to that of a
human.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.