Abstract:The quality of sound pickup in large rooms—such as auditoriums, conference rooms, or classrooms—is impaired by reverberation and interfering noise sources. These degradations can be minimized by a transducer system that discriminates against sound arrivals from all directions except that of the desired source. A two-dimensional array of microphones can be electronically beam-steered to accomplish this directivity. This report gives the theory, design, and implementation of a microprocessor system for automatic… Show more
“…When speech is transmitted in an acoustical environment similar to an office room, it will be degraded by background noise and reverberation (Boll 1979;Cheng & O'Shaughnessy 1991;Ephraim & Trees 1995;Flanagan et al 1985;Huang & Zhao 1998;Jensen & Hansen 2001;Mittal & Phamdo 2000;Miyoshi & Kaneda 1988;Nemer et al 2002;Oh & Viswanathan 1992;Satyanarayana 1999;Scalart & Benmar 1996;Silverman 1987;Subramaniam et al 1996;Yegnanarayana et al 1997Yegnanarayana et al , 1999Yegnanarayana & Murthy 2000). Multichannel case is more effective for enhancement compared to the single channel case, but requires estimation of timedelays (Flanagan et al 1985).…”
“…Multichannel case is more effective for enhancement compared to the single channel case, but requires estimation of timedelays (Flanagan et al 1985). A simple method for enhancement in multichannel case is addition of the speech signals, after compensating for the delays.…”
Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.
Significance of epochs in speech analysisSpeech is the output of a time-varying vocal tract system excited by a time-varying excitation. In the resulting speech signal, the information of the speech production system is embedded as relations in the sequence of values of the signal at different instants of sampling the signal. The main objective of speech signal processing is to extract the information of the time varying characteristics of the speech production system. The information is represented in the form of parameters or features derived from the signal. Knowledge at different levels, such as acoustic-phonetic, prosody, lexical, syntactic, etc. is used to interpret the message in the speech signal from the sequence of parameter or feature vectors. Thus, an algorithmic way of extracting the information in the speech signal involves operations of representation (interms of extracted parameters or features) and processing (to extract the information or message), in that order.
“…When speech is transmitted in an acoustical environment similar to an office room, it will be degraded by background noise and reverberation (Boll 1979;Cheng & O'Shaughnessy 1991;Ephraim & Trees 1995;Flanagan et al 1985;Huang & Zhao 1998;Jensen & Hansen 2001;Mittal & Phamdo 2000;Miyoshi & Kaneda 1988;Nemer et al 2002;Oh & Viswanathan 1992;Satyanarayana 1999;Scalart & Benmar 1996;Silverman 1987;Subramaniam et al 1996;Yegnanarayana et al 1997Yegnanarayana et al , 1999Yegnanarayana & Murthy 2000). Multichannel case is more effective for enhancement compared to the single channel case, but requires estimation of timedelays (Flanagan et al 1985).…”
“…Multichannel case is more effective for enhancement compared to the single channel case, but requires estimation of timedelays (Flanagan et al 1985). A simple method for enhancement in multichannel case is addition of the speech signals, after compensating for the delays.…”
Speech analysis is traditionally performed using short-time analysis to extract features in time and frequency domains. The window size for the analysis is fixed somewhat arbitrarily, mainly to account for the time varying vocal tract system during production. However, speech in its primary mode of excitation is produced due to impulse-like excitation in each glottal cycle. Anchoring the speech analysis around the glottal closure instants (epochs) yields significant benefits for speech analysis. Epoch-based analysis of speech helps not only to segment the speech signals based on speech production characteristics, but also helps in accurate analysis of speech. It enables extraction of important acoustic-phonetic features such as glottal vibrations, formants, instantaneous fundamental frequency, etc. Epoch sequence is useful to manipulate prosody in speech synthesis applications. Accurate estimation of epochs helps in characterizing voice quality features. Epoch extraction also helps in speech enhancement and multispeaker separation. In this tutorial article, the importance of epochs for speech analysis is discussed, and methods to extract the epoch information are reviewed. Applications of epoch extraction for some speech applications are demonstrated.
Significance of epochs in speech analysisSpeech is the output of a time-varying vocal tract system excited by a time-varying excitation. In the resulting speech signal, the information of the speech production system is embedded as relations in the sequence of values of the signal at different instants of sampling the signal. The main objective of speech signal processing is to extract the information of the time varying characteristics of the speech production system. The information is represented in the form of parameters or features derived from the signal. Knowledge at different levels, such as acoustic-phonetic, prosody, lexical, syntactic, etc. is used to interpret the message in the speech signal from the sequence of parameter or feature vectors. Thus, an algorithmic way of extracting the information in the speech signal involves operations of representation (interms of extracted parameters or features) and processing (to extract the information or message), in that order.
“…These time differences and intensity differences can be used to localize the sound source location [1][2][3][4]7,13,14]. In practice, due to the presence of reverberations, the TDOAs are more reliable for sound source localization, and hence are commonly used as the primary basis for source localization.…”
“…Review of noise robust methods for automatic speech recognition 1.3.1. General review A first set of methods to deal with background noise separates the noise from the speech signal (Flanagan et al, 1985), but these algorithms usually make use of two or more microphones and assume that the noise source is far from the speech source. It is then possible to use stereophonic effects to separate the two sound sources.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.