A model oj voiced-sound generation is derived in which the detailed acoustic behavior of the human vocal cords and the vocal tract is computed.The vocal cords are approximated by a self-osdllaiing source composed of two stiffness-coupled masses. The vocal tract is represented as a bilateral transmission line. One-dimensional Bernoulli flow through the vocal cords and plane-wave propagation in the tract are used to establish acoustic factors dominant in the generation of voiced speech. A difference-equation description of the continuous system is derived, and the cord-tract system is programmed for interactive study on a DDP-516 computer. Sampled waveforms are calculated for: acoustic volume velocity through the cord opening (glottis); glottal area; and mouth-output sound pressure. Functional relations between fundamental voice frequency, svbglotial (lung) pressure, cord tension, glottal area, and duty ratio of cord vibration are also determined.Results show that the two-mass model duplicóles principal features of cord behavior in the human. The variation of fundamental frequency with subglottal pressure is found to be 2 to 3 Hz/cm H/), and is essentially independent of voivel confixjuration in the programmed tract. Acoustic interaction between tract eigenfrequencies and ghttal volume flow is strong. Phase difference in motion of the cord edges is in the range of 0 to 60 degrees, and control of cord tension leads to behavior analogous to chest/falsetto conditions in the human. Phonation-neutral, or rest area of cord opening, is shown to be a critical factor in establishing self-oscillation. Finally, the complete synthesis system suggests an efficient, physiological description of the speech signal, namely, in terms of subglottal pressure, cord tension, rest area of cord opening, and vocal-tract shape.
Absiraci-A self-oscillating model of the human vocal cords is derived and simulated on a digital computer. The model is used as a source of excitation for a vocal-tract synthesizer, also programmed on the computer. Synthetic speech from the simulation is used to study the influence of glottal parameters upon signal features. The cord model produces glottal volume velocity functions which reflect the acoustic interaction between source and tract. Voice pitch and irregularities in excitation are generated intrinsically from specification of subglottal pressure, cord tension, and tract configuration. Pitch produced by the cord model is a monotone increasing function of subglottal pressure and tension. Mean air flow and glottal duty factor depend upon a combination of parameters, but primarily upon the properties of the contacting surfaces during cord closure.
The quality of sound pickup in large rooms—such as auditoriums, conference rooms, or classrooms—is impaired by reverberation and interfering noise sources. These degradations can be minimized by a transducer system that discriminates against sound arrivals from all directions except that of the desired source. A two-dimensional array of microphones can be electronically beam-steered to accomplish this directivity. This report gives the theory, design, and implementation of a microprocessor system for automatically steering a two-dimensional microphone array. The signal-seeking transducer system is implemented as a dual-beam ‘‘track-while-scan’’ array. It utilizes signal properties to distinguish between desired speech sources and interfering noise.
A type of vocoder is described that promises modest bandsaving and elimination of the pitch-tracking and voiced-unvoiced switching inherent in spectrum channel vocoders. A speech signal f(t) suffers little degradation when passed through a parallel bank of contiguous bandpass filters and then recombined. If fn(t) is the output of the nth bandpass filter, the original signal f(t) is approximated by Σnfn(t). Each fn(t) can be represented by two parameters: the value of the short-time amplitude spectrum of f(t) evaluated at frequency ωn, and the time derivative of the short-time phase spectrum also evaluated at ωn. These data are transmitted to a synthesizer that produces approximations to each fn(t). The complex short-time spectrum calculated for each channel is F(ωn,t) = ∫ −∞∞ ∫ (λ)h(t−λ)e−jωnλdλ=|F(ωn,t)|eiφ(ωn,t), where h(t) is the impulse response of a realizable low-pass filter. The amplitude |F(ωn,t)| and the phase derivative φ̇(ωn,t) are formed for each channel, low-pass-filtered and transmitted to the sythesizer. At the synthesizer, φ̇(ωn,t) frequency-modulates an oscillator of nominal center frequency ωn, and |F(ωn,t)| amplitude-modulates the same oscillator. The synthesized signals for all n channels are then summed. In effect, the |F(ωn,t)| signals carry the spectral envelope information, and the φ̇(ωn,t) signals carry the excitation information. Previous experience with channel vocoders shows that the |F(ωn,t)| signals may be band-limited to around 20 cps. Our experiments with the phase vocoder indicate that the φ̇(ωn,t) signals may be similarly band-limited. Speech transmitted by the phase vocoder is demonstrated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.