We present a novel, exemplar-based method for audio event detection based on non-negative matrix factorisation. Building on recent work in noise robust automatic speech recognition, we model events as a linear combination of dictionary atoms, and mixtures as a linear combination of overlapping events. The weights of activated atoms in an observation serve directly as evidence for the underlying event classes. The atoms in the dictionary span multiple frames and are created by extracting all possible fixed-length exemplars from the training data. To combat data scarcity of small training datasets, we propose to artificially augment the amount of training data by linear time warping in the feature domain at multiple rates. The method is evaluated on the Office Live and Office Synthetic datasets released by the AASP Challenge on Detection and Classification of Acoustic Scenes and Events.
This work examines the use of a Wireless Acoustic Sensor Network (WASN) for the classification of clinically relevant activities of daily living (ADL) of elderly people. The aim of this research is to automatically compile a summary report about the performed ADLs which can be easily interpreted by caregivers. In this work, the classification performance of the WASN will be evaluated in both clean and noisy conditions. Results indicate that the classification performance of the WASN is 75.3±4.3% on clean acoustic data selected from the node receiving with the highest SNR. By incorporating spatial information extracted by the WASN, the classification accuracy further increases to 78.6±1.4%. In addition, the classification performance of the WASN in noisy conditions is in absolute average 8.1% to 9.0% more accurate compared to highest obtained single microphone results.
This paper gives an overview of research within the ALADIN project, which aims to develop an assistive vocal interface for people with a physical impairment. In contrast to existing approaches, the vocal interface is trained by the end-user himself, which means it can be used with any vocabulary and grammar, and that it is maximally adapted to the -possibly dysarthric -speech of the user. This paper describes the overall learning framework, the user-centred design and evaluation aspects, database collection and approaches taken to combat problems such as noise and erroneous input.
In this paper two different convolutional neural network (CNN) architectures are investigated for the purpose of real-time on-edge domestic acoustic event classification. For training and evaluation of the models, a real-life acoustical dataset was recorded in 72 different home environments. A quantization-aware training scheme was applied that takes into account that the models need to run on 8-bit fixed-point processing hardware. Once trained, the models were successfully deployed on an ARM cortex-M7 microcontroller unit (i.MX RT1064). This study indicates that the used procedure can lead to an efficient and real-time embedded on-edge implementation of a domestic sound event classifier that does not sacrifice classification performance compared to its floating-point counterpart.
Acoustic event classification for monitoring applications is becoming feasible thanks to the increasing number of connected devices with a built-in microphone. The sound event classes are defined by annotating training data, which is a laborious process. Attempts have been made to reduce the workload on annotating the vast amounts of training data, and are referred to as semi-supervised learning and active learning. In this paper, we propose a non-negative matrix deconvolution (NMD) based approach, capable of modelling acoustic events from data labelled on a low-resolution and multi-label level and thereby reducing the annotation workload. We further show that the proposed extension of NMD is successfully applied for the classification of acoustic events, even in noisy conditions and with overlapping events.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.