With the increasing use of audio sensors in surveillance and monitoring applications, event detection using audio streams has emerged as an important research problem. This paper presents a hierarchical approach for audio based event detection for surveillance. The proposed approach first classifies a given audio frame into vocal and nonvocal events, and then performs further classification into normal and excited events. We model the events using a Gaussian Mixture Model and optimize the parameters for four different audio features ZCR, LPC, LPCC and LFCC. Experiments have been performed to evaluate the effectiveness of the features for detecting various normal and the excited state human activities. The results show that the proposed top-down event detection approach works significantly better than the single level approach.
Sleep apnoea is a sleep breathing disorder which causes changes in cardiac and neuronal activity and discontinuities in sleep pattern when observed via electrocardiogram (ECG) and electroencephalogram (EEG). Using both statistical analysis and Gaussian discriminative modelling approaches, this paper presents a pilot study of assessing the cross-correlation between EEG frequency bands and heart rate variability (HRV) in normal and sleep apnoea clinical patients. For the study we used EEG (delta, theta, alpha, sigma and beta) and HRV (LF(nu), HF(nu) and LF/HF) features from the spectral analysis. The statistical analysis in different sleep stages highlighted that in sleep apnoea patients, the EEG delta, sigma and beta bands exhibited a strong correlation with HRV features. Then the correlation between EEG frequency bands and HRV features were examined for sleep apnoea classification using univariate and multivariate Gaussian models (UGs and MGs). The MG outperformed the UG in the classification. When EEG and HRV features were combined and modelled with MG, we achieved 64% correct classification accuracy, which is 2 or 8% improvement with respect to using only EEG or ECG features. When delta and acceleration coefficients of the EEG features were incorporated, then the overall accuracy improved to 71%.
In this paper, we report the influence that classification accuracies have in speech analysis from a clinical dataset by adding acoustic low-level descriptors (LLD) belonging to prosodic (i.e. pitch, formants, energy, jitter, shimmer) and spectral features (i.e. spectral flux, centroid, entropy and roll-off) along with their delta ( ) and delta-delta ( -) coefficients to two baseline features of Mel frequency cepstral coefficients and Teager energy criticalband based autocorrelation envelope. Extracted acoustic low-level descriptors (LLD) that display an increase in accuracy after being added to these baseline features were finally modeled together using Gaussian mixture models and tested. A clinical data set of speech from 139 adolescents, including 68 (49 girls and 19 boys) diagnosed as clinically depressed, was used in the classification experiments. For male subjects, the combination of (TEO-CBAuto-Env + + -) + F0 + (LogE + + -) + (Shimmer + ) + Spectral Flux + Spectral Roll-off gave the highest classification rate of 77.82% while for the female subjects, using TEO-CB-AutoEnv gave an accuracy of 74.74%.
This paper proposes a novel framework for music content indexing and retrieval. The music structure information, i.e., timing, harmony and music region content, is represented by the layers of the music structure pyramid. We begin by extracting this layered structure information. We analyze the rhythm of the music and then segment the signal proportional to the inter-beat intervals. Thus, the timing information is incorporated in the segmentation process, which we call Beat Space Segmentation. To describe Harmony Events, we propose a two-layer hierarchical approach to model the music chords. We also model the progression of instrumental and vocal content as Acoustic Events. After information extraction, we propose a vector space modeling approach which uses these events as the indexing terms. In queryby-example music retrieval, a query is represented by a vector of the statistics of the n-gram events. We then propose two effective retrieval models, a hard-indexing scheme and a soft-indexing scheme. Experiments show that the vector space modeling is effective in representing the layered music information, achieving 82.5% top-5 retrieval accuracy using 15-sec music clips as the queries. The soft-indexing outperforms hard-indexing in general.
Music structure is very important for semantic music understanding. We propose a novel approach for popular music structure detection. The proposed approach applies beat space segmentation, chord detection, singing voice boundary detection, melody and content based similarity region detection to music structure detection. A frequency scaling "Octave Scale" is used to calculate Cepstral coefficients to represent the music content. The experiments illustrate that the proposed approach achieves better performance than existing methods. We also outline some applications which can use our refined music structural analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.