This paper presents a word boundary detection technique based on frame classification using the nonlinear characteristics of speech. Bispectral analysis was used to classify speech frames into voiced, unvoiced and noise segments. To improve classification accuracy, bispectral features were combined with other features such as short time energy, zero-crossing rate, autocorrelation and high-to-low frequency ratio.
Experimental results indicate that classification error decreases when bispectrum is combined with other features. Thus bispectral features can be used as supplementary to augment simple time domain features for demarcating word boundaries in speech. Validation of results was carried out by manual verification.Index Terms-bispectrum; word boundary detection;
This paper proposes a set of features based on the psychoacoustic masking phenomenon of human auditory system for speech recognition. Features are determined using the difference between spectral energy of speech frames and their global masking thresholds in each of 17 bands of an utterance. Performance of the proposed features in a keyword spotting experiment employing dynamic time warping for feature matching showed the viability of the perceptually significant feature set. For multisyllabic words, features from both the proposed set and mel frequency cepstral coefficients (MFCCs) performed equally while for monosyllabic words the proposed set outperformed MFCCs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.