2021 International Joint Conference on Neural Networks (IJCNN) 2021
DOI: 10.1109/ijcnn52387.2021.9533654
|View full text |Cite
|
Sign up to set email alerts
|

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
25
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 26 publications
(26 citation statements)
references
References 20 publications
0
25
0
Order By: Relevance
“…The environmental sound classification task implies an assignment of correct labels given samples belonging to sound classes that surround us in the everyday life (e.g., "alarm clock", "car horn", "jackhammer", "mouse clicking", "cat"). To successfully solve this task, different approaches were proposed that included the use of one- [27,28] or two-dimensional Convolutional Neural Networks (CNN) operating on static [18,24,32,9,15,17,33,8,30] or trainable [23,10] time-frequency transformation of raw audio. While the first approaches relied on the task-specific design of models, the latter results confirmed that the use of domain adaptation from visual domain is beneficial [9,17,10].…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…The environmental sound classification task implies an assignment of correct labels given samples belonging to sound classes that surround us in the everyday life (e.g., "alarm clock", "car horn", "jackhammer", "mouse clicking", "cat"). To successfully solve this task, different approaches were proposed that included the use of one- [27,28] or two-dimensional Convolutional Neural Networks (CNN) operating on static [18,24,32,9,15,17,33,8,30] or trainable [23,10] time-frequency transformation of raw audio. While the first approaches relied on the task-specific design of models, the latter results confirmed that the use of domain adaptation from visual domain is beneficial [9,17,10].…”
Section: Related Workmentioning
confidence: 99%
“…To successfully solve this task, different approaches were proposed that included the use of one- [27,28] or two-dimensional Convolutional Neural Networks (CNN) operating on static [18,24,32,9,15,17,33,8,30] or trainable [23,10] time-frequency transformation of raw audio. While the first approaches relied on the task-specific design of models, the latter results confirmed that the use of domain adaptation from visual domain is beneficial [9,17,10]. However, the visual modality was used in a sequential way, implying the processing of only one modality simultaneously.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations