The human activity that takes place in meeting-rooms or class-rooms is reflected in a rich variety of acoustic events, either produced by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity. Additionally, detection of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition.Automatic detection and classification of acoustic events is the objective of this thesis work. It aims at processing the acoustic signals collected by distant microphones in meeting-room or classroom environments to convert them into symbolic descriptions corresponding to a listener's perception of the different sound events that are present in the signals and their sources.First of all, the task of acoustic event classification is faced using Support Vector Machine (SVM) classifiers, which are motivated by the scarcity of training data. A confusion-matrix-based variable-feature-set clustering scheme is developed for the multiclass recognition problem, and tested on the gathered database. With it, a higher classification rate than the GMM-based technique is obtained, arriving to a large relative average error reduction with respect to the best result from the conventional binary tree scheme. Moreover, several ways to extend SVMs to sequence processing are compared, in an attempt to avoid the drawback of SVMs when dealing with audio data, i.e. their restriction to work with fixed-length vectors, observing that the dynamic time warping kernels work well for sounds that show a temporal structure. Furthermore, concepts and tools from the fuzzy theory are used to investigate, first, the importance of and degree of interaction among features, and second, ways to fuse the outputs of several classification systems. The developed AEC systems are Two kinds of databases are used: two databases of isolated acoustic events, and a database of interactive seminars containing a significant number of acoustic events of interest. Our developed systems, which consist of SVM-based classification within a sliding window plus post-processing, were the only submissions not using HMMs, and each of them obtained competitive results in the corresponding evaluation.vi Speech activity detection was also pursued in this thesis since, in fact, it is a -especially important -particular case of acoustic event detection. An enhanced SVM training approach for the speech activity detection task is developed, mainly to cope with the problem of dataset reduction.The resulting SVM-based system is tested with several NIST Rich Transcription (RT) evaluation datasets, and it shows better scores than our GMM-based system, which ranked among the best systems in the RT06 evaluation.Finally, it is worth mentioning a few side outcomes from this thesis work. As it has been carried out in the framework of the CHIL EU project, the author has been responsible for the organization of the above m...
Acoustic events produced in controlled environments may carry information useful for perceptually aware interfaces. In this paper we focus on the problem of classifying 16 types of meeting-room acoustic events.First of all, we have defined the events and gathered a sound database. Then, several classifiers based on support vector machines (SVM) are developed using confusion matrix based clustering schemes to deal with the multi-class problem. Also, several sets of acoustic features are defined and used in the classification tests.In the experiments, the developed SVM-based classifiers are compared with an already reported binary tree scheme and with their correlative Gaussian mixture model (GMM) classifiers. The best results are obtained with a tree SVM-based classifier that may use a different feature set at each node. With it, a 31.5% relative average error reduction is obtained with respect to the best result from a conventional binary tree scheme.
Adaptive probabilistic modelling of the EEG background is proposed for seizure detection in neonates with hypoxic ischemic encephalopathy. The decision is made based on the temporal derivative of the seizure probability with respect to the adaptively modeled level of background activity. The robustness of the system to long duration 'seizure-like' artifacts, in particular those due to respiration, is improved. The system was developed using statistical leave-one-patient-out performance assessment, on a large clinical dataset, comprising 38 patients of 1479 hours total duration. The developed technique was then validated by a single test on a separate totally unseen randomized prospective dataset of 51 neonates totaling 2540 hours of duration. By exploiting the proposed adaptation, the ROC area is increased from 93.4% to 96.1% (41% relative improvement). The number of false detections per hour is decreased from 0.42 to 0.24, while maintaining the correct detection of seizure burden at 70%. These results on the unseen data were predicted from the rigorous leave-one-patient-out validation and confirm the validity of our algorithm development process.
HighlightsSeizure detection algorithm (SDA) validated on unseen, unedited EEG of 70 neonates.Results at SDA sensitivity settings of 0.5–0.3 acceptable for clinical use.Seizure detection rate of 52.6–75.0%, false detection rate 0.04–0.36 FD/h.
Acoustic event classification may help to describe acoustic scenes and contribute to improve the robustness of speech technologies. In this work, fusion of different information sources with the Fuzzy Integral (FI), and the associated Fuzzy Measure (FM), are applied to the problem of classifying a small set of highly confusable human non-speech sounds. As FI is a meaningful formalism for combining classifier outputs that can capture interactions among the various sources of information, it shows in our experiments a significantly better performance than that of any single classifier entering the FI fusion module. Actually, that FI decision-level fusion approach shows comparable results to the highperforming SVM feature-level fusion and thus it seems to be a good choice when feature-level fusion is not an option. We have also observed that the importance and the degree of interaction among the various feature types given by the FM can be used for feature selection, and gives a valuable insight into the problem.Keywords: Acoustic event classification, audio features, fuzzy integral and fuzzy measure, feature-level and decision-level information fusion, feature selection, interaction of information sources, Choquet integral.
Abstract. In this paper, the Acoustic Event Detection (AED) system developed at the UPC is described, and its results in the CLEAR evaluations carried out in March 2007 are reported. The system uses a set of features composed of frequency-filtered band energies and perceptual features, and it is based on SVM classifiers and multi-microphone decision fusion. Also, the current evaluation setup and, in particular, the two new metrics used in this evaluation are presented.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2023 scite Inc. All rights reserved.
Made with 💙 for researchers