This article is focused on speech coding methods for achieving communication quality speech at bit rates of 4 kbit/s and lower. The speech coding techniques are based on an all‐pole model of the vocal tract which may be implemented in the time domain with appropriately selected excitation functions or else may be fit to a spectral analysis of the speech signal. Three main types of coders are described below. Code‐excited linear prediction (CELP) coders select their excitation from waveform codebooks using analysis‐by‐synthesis closed‐loop techniques, which need to be supplemented by speech classification and open‐loop parametric techniques for keeping up with quality at lower rates. The prototypical sinusoidal coder (SC) has a bank of oscillators for signal synthesis, driven by a model of the magnitude spectrum. However, phase regeneration is important in enhancing speech reconstruction at low rates. Waveform interpolation (WI) coders afford a wider time‐frequency footprint for the representation of the excitation, showing a good potential for achieving toll quality at bit rates below 4 kbit/s.
The kazoo, a wind instrument, generates its typical sound when stimulated by voiced speech. Using this instrument, this paper proposes a novel technique to recover the glottal pulse excitation of its player. We applied multiband frequency techniques to the kazoo signal to compare the results with those obtained from the corresponding recordings of an electroglottograph (EGG). With the player's management over his embouchure on the instrument, one can make recordings for spoken and singing speech as well as recitative, at the instrument’s resonator cap, which closely fit the EGG recordings. After a spectrogram analysis, it was possible to detect in the lower frequency band of the kazoo signals, the spectral envelope and, in the higher frequency band, the pitch harmonics mixed with the spectral decay of the glottal pulse. A quadrature mirror filter (QMF) was designed, providing this source-filter separation. Additionally, a reverse spectral band replication (SBR) technique was applied, which consists in recovering the lower frequency band by the demodulation of the higher frequency band followed by a total energy spectral gain adjustment, where a new signal was generated and then evaluated. At the end, a subjective evaluation, SNR, and SD measures prove the efficiency of the proposed method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.