2017
DOI: 10.1007/s11042-016-4332-z
|View full text |Cite
|
Sign up to set email alerts
|

Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
3
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 21 publications
(4 citation statements)
references
References 35 publications
0
3
0
Order By: Relevance
“…Chiba et al [61] recently proposed multi-stream attention-based BiL-STM network for speech emotion recognition. Li et al [62] extracted deep features by training multi-stream hierarchical DNN for acoustic event detection. Moreover Sheikh et al [29] found that settings like context frame size optimized for one stuttering class are not good for other stuttering types.…”
Section: Multi-contextual Stutternetmentioning
confidence: 99%
“…Chiba et al [61] recently proposed multi-stream attention-based BiL-STM network for speech emotion recognition. Li et al [62] extracted deep features by training multi-stream hierarchical DNN for acoustic event detection. Moreover Sheikh et al [29] found that settings like context frame size optimized for one stuttering class are not good for other stuttering types.…”
Section: Multi-contextual Stutternetmentioning
confidence: 99%
“…Because of the development of artificial intelligence (AI) and deep learning (DL), deep features of audio data are widely studied and used in many audio-based applications, such as acoustic scene classification [ 77 ][ 78 ], audio/video analysis [ 79 ], and speaker recognition [ 80 ], since 2010.…”
Section: Evolution Of Audio Featuresmentioning
confidence: 99%
“…In addition, the common hand-crafted features used for acoustic scene classification (or clustering) include the logarithm mel-band energy, mel frequency cepstral coefficient (MFCC), spectral flux, spectrogram, Gabor filterbank, cochleograms, I-vector, histogram of gradients features [12]- [15], the histogram of gradients of timefrequency representations (HGTR) [14], hash features [16], and local binary patterns [17], [18]. In recent years, some transformed features using matrix factorization [19], [20] and deep neural network [6], [11], [21], are used to address the lack of flexibility of hand-crafted features. Hand-crafted or shallow features did not effectively represent the property differences among various classes of acoustic scenes, and thus their performance was inferior to that of deep transformed features learned by deep neural networks, such as convolutional neural network (CNN) [11], [22]- [25].…”
Section: Introductionmentioning
confidence: 99%