Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns

Phan, Huy; Hertel, Lars; Maass, Marco; Mazur, Radoslaw; Mertins, Alfred

doi:10.1109/taslp.2016.2530401

Cited by 26 publications

(17 citation statements)

References 52 publications

(82 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our previous work [23] demonstrated that learned representations that take into account the structure of scene data can be highly discriminative, as state-of-the-art performance can be obtained even with simple linear classifiers. The label tree embeddings used in this work also bear some resemblance with those in [21], [38] in which label tree embeddings of speech patterns were learned to extract generic features for audio events.…”

Section: Eralizationmentioning

confidence: 99%

Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks

Phan

Hertel

Maass

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Eralizationmentioning

confidence: 99%

Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks

Phan

Hertel

Maass

et al. 2017

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…The study introduces 2-fold techniques for speech quality improvement and multi-conditional training requirements respectively finally it undergoes thorough a joint optimization approach which leverages the objective measures under the simulated datasets. The study also incorporated as neural network based learning model which attain better classification accuracy with lower error rate [25].…”

Section: Hertel M Maass R Mazur and A Mertins 2016)mentioning

confidence: 99%

Speech Classification using Logical ART Deep Mechanism of Machine Learning

S*,

Hiremath,

Birader

2019

IJITEE

View full text Add to dashboard Cite

Apart from this there are many domains including medical, voice synthesis, hate speech classification and other custom applications where classification of speech plays an important role. The conventional techniques of speech processing and classification works on a small data set also provide lower accuracy of the classification. This paper introduces a learning model using neural network (NN) for the large dataset machine training and classification using critical feature analysis for the pattern of speech spectrogram and waveforms. The performance evaluation of the proposed training model for the speech classification is validated on a single CPU and found to achieve (12-82) % of accuracy in just 5-epochs and also continuously decreases the loss at successive iteration of the epochs. This method provides learning model framework for the speech processing and classification for a very large dataset.

show abstract

“…This is important for smart home and vehicle environments, speech interaction and telecommunication systems, and has relevance to audiobased security monitoring, ambient event detection and auditory scene analysis. Sound event detection research has traditionally been driven by techniques developed for speech recognition, including Mel-frequency cepstral coefficients (MFCCs), perceptual linear prediction (PLPs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) [1,2,3,4,5]. However these features and methods have more recently been surpassed by spectrogram-based techniques [6,7], especially for the classification of noise-corrupted sounds.…”

Section: Introductionmentioning

confidence: 99%

Early Detection of Continuous and Partial Audio Events Using CNN

et al. 2018

Self Cite

View full text Add to dashboard Cite

Sound event detection is an extension of the static auditory classification task into continuous environments, where performance depends jointly upon the detection of overlapping events and their correct classification. Several approaches have been published to date which either develop novel classifiers or employ well-trained static classifiers with a detection front-end. This paper takes the latter approach, by combining a proven CNN classifier acting on spectrogram image features, with time-frequency shaped energy detection that identifies seed regions within the spectrogram that are characteristic of auditory energy events. Furthermore, the shape detector is optimised to allow early detection of events as they are developing. Since some sound events naturally have longer durations than others, waiting until completion of entire events before classification may not be practical in a deployed system. The early detection capability of the system is thus evaluated for the classification of partial events. Performance for continuous event detection is shown to be good, with accuracy being maintained well when detecting partial events.

show abstract

Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns

Abstract: The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.

Cited by 26 publications

References 52 publications

Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks

Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks

Speech Classification using Logical ART Deep Mechanism of Machine Learning

Early Detection of Continuous and Partial Audio Events Using CNN

Contact Info

Product

Resources

About