Controlling the VOSIM sound synthesis system

Kaegi, Werner

doi:10.1080/09298218608570476

Cited by 10 publications

(7 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The techniques used for experimentations on vocal synthesis are e.g. VOSIM [20], Frequency Modulation (FM) [21], Klatt Filter Model [22], Time-Domain Formant-Wave-Function Synthesis (FOF) [23], Phase Aligned Formant Synthesis (PAF) [24] and Spectral Modeling followed by Additive Synthesis [25]. These algorithms have been evaluated regarding computational costs, suitable parameterization, and of course, the resulting sound quality they produce.…”

Section: Production Of Speech Soundsmentioning

confidence: 99%

Towards Human-Like Production and Binaural Localization of Speech Sounds in Humanoid Robots

Wolff

Lasseck

Hild

et al. 2009

2009 3rd International Conference on Bioinformatics and Biomedical Engineering

View full text Add to dashboard Cite

We present a prototype of a humanoid robot head equipped with human-like speech sound localization and production systems designed for a new generation of robots that should autonomously evolve language and other cognitive skills. Similarly to the human auditory apparatus, the robot head contains a binaural sensor system based upon a frequency domain binaural model. This enables the robot to detect and locate the speaker autonomously on the basis of the produced speech signals. However, the temporal regularity of incoming sounds is in humans analyzed on different time scales, with the millisecond range giving rise to the sensation of pitch and the periods on the order of seconds giving rise to the sensation of rhythm. In addition, unlike for humans, detecting and localizing multiple sound signals is a rather nontrivial problem for machine audition. We therefore discuss a possible implementation of human-like spatiotemporal processing of sounds in single and multisource scenarios. Our future goals are to adequately combine the constructed speech synthesis and physical audio systems, and to establish an algorithm for detailed spatiotemporal localization of both single and concurrent speech sound sources, with roughly human-like temporal and spatial processing capabilities.

show abstract

Section: Production Of Speech Soundsmentioning

confidence: 99%

Towards Human-Like Production and Binaural Localization of Speech Sounds in Humanoid Robots

Wolff

Lasseck

Hild

et al. 2009

2009 3rd International Conference on Bioinformatics and Biomedical Engineering

View full text Add to dashboard Cite

show abstract

“…To produce a glottal pulse, the phase increment of the cosine is modulated to permit a local time-scale speedup or delay. This method, used in the VOSIM model presented by Kaegi and Templaars (1978), was suggested by Peter Pabon (1994). In the human voice, as well as in most musical instruments, an increase in sound level is associated with a decrease in spectral tilt. As a result, the higher partials gain more in amplitude during a crescendo than the lower partials.…”

Section: The Musse and Musse Dig Singing Synthesizersmentioning

confidence: 99%

The KTH Rule System for Singing Synthesis

Berndtsson¹

1996

Computer Music Journal

View full text Add to dashboard Cite

show abstract

“…VPM is similar to old techniques such as FOF [32] and VOSIM [33], where voice is modeled as a sequence of pulses whose timbre is roughly represented by a set of ideal resonances. However, in VPM the timbre is represented by all the harmonics, allowing capturing subtle details and nuances of both amplitude and phase spectra.…”

Section: Vpmmentioning

confidence: 99%