2016
DOI: 10.1109/taslp.2016.2530401
|View full text |Cite
|
Sign up to set email alerts
|

Learning Representations for Nonspeech Audio Events Through Their Similarities to Speech Patterns

Abstract: The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
2

Relationship

2
7

Authors

Journals

citations
Cited by 26 publications
(17 citation statements)
references
References 52 publications
(82 reference statements)
0
16
0
Order By: Relevance
“…Our previous work [23] demonstrated that learned representations that take into account the structure of scene data can be highly discriminative, as state-of-the-art performance can be obtained even with simple linear classifiers. The label tree embeddings used in this work also bear some resemblance with those in [21], [38] in which label tree embeddings of speech patterns were learned to extract generic features for audio events.…”
Section: Eralizationmentioning
confidence: 99%
“…Our previous work [23] demonstrated that learned representations that take into account the structure of scene data can be highly discriminative, as state-of-the-art performance can be obtained even with simple linear classifiers. The label tree embeddings used in this work also bear some resemblance with those in [21], [38] in which label tree embeddings of speech patterns were learned to extract generic features for audio events.…”
Section: Eralizationmentioning
confidence: 99%
“…The study introduces 2-fold techniques for speech quality improvement and multi-conditional training requirements respectively finally it undergoes thorough a joint optimization approach which leverages the objective measures under the simulated datasets. The study also incorporated as neural network based learning model which attain better classification accuracy with lower error rate [25].…”
Section: Hertel M Maass R Mazur and A Mertins 2016)mentioning
confidence: 99%
“…This is important for smart home and vehicle environments, speech interaction and telecommunication systems, and has relevance to audiobased security monitoring, ambient event detection and auditory scene analysis. Sound event detection research has traditionally been driven by techniques developed for speech recognition, including Mel-frequency cepstral coefficients (MFCCs), perceptual linear prediction (PLPs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) [1,2,3,4,5]. However these features and methods have more recently been surpassed by spectrogram-based techniques [6,7], especially for the classification of noise-corrupted sounds.…”
Section: Introductionmentioning
confidence: 99%