Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-1160
| View full text |Cite
|
Sign up to set email alerts
|

Abstract: In this paper we present our system for the detection and classification of acoustic scenes and events (DCASE) 2020 Challenge Task 4: Sound event detection and separation in domestic environments. We introduce two new models: the forward-backward convolutional recurrent neural network (FBCRNN) and the tagconditioned convolutional neural network (CNN). The FBCRNN employs two recurrent neural network (RNN) classifiers sharing the same CNN for preprocessing. With one RNN processing a recording in forward directio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
56
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 39 publications
(56 citation statements)
references
References 23 publications
(23 reference statements)
0
56
0
Order By: Relevance
“…Extending it with an HMM temporal structure for sub-phonetic units leads to the DP-HMM and the HDP-HMM [69], [70], [71]. HMM-VAE proposes the use of a deep neural network instead of a GMM [72], [73]. These approaches enforce top-down constraints via HMM temporal smoothing and temporal modeling.…”
Section: Related Workmentioning
confidence: 99%
“…Extending it with an HMM temporal structure for sub-phonetic units leads to the DP-HMM and the HDP-HMM [69], [70], [71]. HMM-VAE proposes the use of a deep neural network instead of a GMM [72], [73]. These approaches enforce top-down constraints via HMM temporal smoothing and temporal modeling.…”
Section: Related Workmentioning
confidence: 99%
“…We evaluated the different AUD algorithms in terms of phonetic segmentation and equivalent Phone Error Rate (eq. PER) ( [17,5]). For the phonetic segmentation we used the standard Recall, Precision and F-score measured against the timing provided in the TIMIT database with the 61 original phones.…”
Section: Data Features and Metricsmentioning
confidence: 99%
“…Prior Language Recall Precision F-score eq. PER HMM [5] MFCC + ∆ + ∆∆ None ---65.4 VAE-HMM [5] MFCC + ∆ + ∆∆ None ---58.9 VAE-BHMM [6] iterations. Results, shown in Fig.…”
Section: Model Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…It was also shown that this model can be further improved by incorporating a Bayesian "phonotactic" language model learned jointly with the acoustic units [4]. Finally, following the work in [5] it has been combined successfully with variational auto-encoders leading to a model combining the potential of both deep neural networks and Bayesian models [6]. The contribution of this work is threefold:…”
Section: Introductionmentioning
confidence: 99%