Mel frequency cepstral coefficients (MFCC) are the most widely used speech features in automatic speech recognition systems, primarily because the coefficients fit well with the assumptions used in hidden Markov models and because of the superior noise robustness of MFCC over alternative feature sets such as linear prediction-based coefficients. The authors have recently introduced human factor cepstral coefficients (HFCC), a modification of MFCC that uses the known relationship between center frequency and critical bandwidth from human psychoacoustics to decouple filter bandwidth from filter spacing. In this work, the authors introduce a variation of HFCC called HFCC-E in which filter bandwidth is linearly scaled in order to investigate the effects of wider filter bandwidth on noise robustness. Experimental results show an increase in signal-to-noise ratio of 7 dB over traditional MFCC algorithms when filter bandwidth increases in HFCC-E. An important attribute of both HFCC and HFCC-E is that the algorithms only differ from MFCC in the filter bank coefficients: increased noise robustness using wider filters is achieved with no additional computational cost.
Current automatic acoustic detection and classification of microchiroptera utilize global features of individual calls (i.e., duration, bandwidth, frequency extrema), an approach that stems from expert knowledge of call sonograms. This approach parallels the acoustic phonetic paradigm of human automatic speech recognition (ASR), which relied on expert knowledge to account for variations in canonical linguistic units. ASR research eventually shifted from acoustic phonetics to machine learning, primarily because of the superior ability of machine learning to account for signal variation. To compare machine learning with conventional methods of detection and classification, nearly 3000 search-phase calls were hand labeled from recordings of five species: Pipistrellus bodenheimeri, Molossus molossus, Lasiurus borealis, L. cinereus semotus, and Tadarida brasiliensis. The hand labels were used to train two machine learning models: a Gaussian mixture model (GMM) for detection and classification and a hidden Markov model (HMM) for classification. The GMM detector produced 4% error compared to 32% error for a baseline broadband energy detector, while the GMM and HMM classifiers produced errors of 0.6 +/- 0.2% compared to 16.9 +/- 1.1% error for a baseline discriminant function analysis classifier. The experiments showed that machine learning algorithms produced errors an order of magnitude smaller than those for conventional methods.
We propose a method of predicting intrauterine pressure (IUP) from external electrohysterograms (EHG) using a causal FIR Wiener filter. IUP and 8-channel EHG data were collected simultaneously from 14 laboring patients at term, and prediction models were trained and tested using 10-min windows for each patient and channel. RMS prediction error varied between 5-14 mmHg across all patients. We performed a 4-way analysis of variance on the RMS error, which varied across patients, channels, time (test window) and model (train window). The patient-channel interaction was the most significant factor while channel alone was not significant, indicating that different channels produced significantly different RMS errors depending on the patient. The channel-time factor was significant due to single-channel bursty noise, while time was a significant factor due to multichannel bursty noise. The time-model interaction was not significant, supporting the assumption that the random process generating the IUP and EHG signals was stationary. The results demonstrate the capabilities of optimal linear filter in predicting IUP from external EHG and offer insight into the factors that affect prediction error of IUP from multichannel EHG recordings.
The echo state network (ESN) has been recently with a static (untrained) random weight matrix, and 2) a proposed as an alternative recurrent neural network model. An readout which projects the reservoir state values onto a linear ESN consists of a reservoir of conventional processing elements, regressor. The reservoir weight matrix is randomly determined which are recurrently interconnected with untrained random weights, and a readout layer, which is trained using linear a subiect to simle constraints in order to satisf the "echo regression methods. The key advantage of the ESN is the ability state requirements [11]. The constraints on the reservoir to model systems without the need to train the recurrent weights. weight matrix guarantee that the reservoir state is driven by In this paper, we use an ESN to model the production of speech the input with fading memory. The simple train procedure of signals in a classification experiment using isolated utterances of the ESN distinguishes it from other RNN designs that employ the English digits "zero" through "nine." One prediction model for each digit was trained using frame-based speech features computationally expensive iterative train methods, making the (cepstral coefficients) from all train utterances, and the readout ESN an attractive model for a wide range of applications. layer consisted of several linear regressors which were trained Artificial neural networks are typically employed as classito target different portions of the time series using a dynamic fiers by training the network to produce one of two Bayesian programming algorithm (Viterbi). Each novel test utterance terms: 1) the class-conditional density function p(xwoi) , or 2) was classified with the label from the digit model with the . h minimum mean squared prediction error. Using a corpus of 4130 the a posteriori probability P(simx) for the ith class ei [15]. isolated digits from 8 male and 8 female speakers, the highest The density p(x wi) can be estimated indirectly by modeling classification accuracy attained with an ESN was 100.0% (99.1%) the production of x through a predictive model: on the train (test) set, compared to 100% (94.7%) for a hidden Markov model (HMM). HMM performance increased to 100.0% <(n + 1) F(X(n),... , A)(1) (99.8%) when context features (first-and second-order temporal derivatives) were appended to the cepstral coefficients. The ESN where x(n) is a set of the history of x up to h(m), and the offers an attractive alternative to the HMM because of the ESN's model parameter set A of F is determined such that the meansimple train procedure, low computational requirements, and squared error (MSE) between x (n+1) and the estimate ± (n+ inherent ability to model the dynamics of the signal under study. 1) is minimized. With some assumptions on the distribution of the error signal, the minimization of the prediction MSE has
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.