2023
DOI: 10.1109/lsp.2023.3319233
|View full text |Cite
|
Sign up to set email alerts
|

Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification

Yuanbo Hou,
Siyang Song,
Chuang Yu
et al.

Abstract: Most deep learning-based acoustic scene classification (ASC) approaches identify scenes based on acoustic features converted from audio clips containing mixed information entangled by polyphonic audio events (AEs). However, these approaches have difficulties in explaining what cues they use to identify scenes. This paper conducts the first study on disclosing the relationship between real-life acoustic scenes and semantic embeddings from the most relevant AEs. Specifically, we propose an event-relational graph… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…Multiple audio events may be included in a single audio clip. Inspired by [17], we modified the originally shared single linear layer to linear layers containing only one neuron corresponding to the number of events, i.e., FC f i and FC ci , to obtain a node representation of the corresponding events. Therefore, a single audio can provide feature vectors for all events, i.e., E f and E c , and the process can be briefly expressed as follows:…”
Section: Audio Feature Extraction For Multigranularitymentioning
confidence: 99%
See 2 more Smart Citations
“…Multiple audio events may be included in a single audio clip. Inspired by [17], we modified the originally shared single linear layer to linear layers containing only one neuron corresponding to the number of events, i.e., FC f i and FC ci , to obtain a node representation of the corresponding events. Therefore, a single audio can provide feature vectors for all events, i.e., E f and E c , and the process can be briefly expressed as follows:…”
Section: Audio Feature Extraction For Multigranularitymentioning
confidence: 99%
“…It aims to learn the relationships between different event nodes and update the node features employing the subsequent graph convolutional learning. Inspired by [17], we define a hierarchical graph containing multimodal multi-granularity event nodes. They are all constructed in a fully connected manner.…”
Section: Hierarchical Graph Constructionmentioning
confidence: 99%
See 1 more Smart Citation
“…Although graphs have been widely employed to represent and analyze visual and textual data, their potential to represent audio data has received relatively less attention [33][34][35]. Nonetheless, audio data, ranging from speech signals to music recordings, inherently exhibit temporal dependencies and complex patterns that can be effectively captured and modeled using graph-based representations.…”
Section: Introductionmentioning
confidence: 99%