Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Parekh, Jayneel; Parekh, Sanjeel; Mozharovskyi, Pavlo; ́e-Buc, Florence d’Alch; Richard, Ga ̈el

doi:10.31219/osf.io/4rtjs

Cited by 2 publications

(5 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…with a two-step optimization process [8]. In this work, we consider the sparse NMF implementation [24] as suggested in [7]. In our segmentation proxy model, the dictionary W is pre-learned while the activation H is extracted by a neural model Ψ and referred to as an embedding.…”

Section: Non-negative Matrix Factorization (Nmf)mentioning

confidence: 99%

“…In this work, the f model is pre-trained with frozen weights and serves as a teacher for the proxy model. We use a similar approach as [7] where the proxy model is composed of two functions. Let Ψ be a function that maps a sequence of D-dimension feature vectors S ∈ R D×T to the embedding H ∈ R K×T…”

Section: Proxy Model Frameworkmentioning

confidence: 99%

“…Recently, a few explainable models have been developed like APNet [5], which extends the training of prototypes to the audio domain, and post-hoc visualization of explanations obtained from Shapley values [6]. In the architecture proposed in [7], the authors explain a black box audio classifier with a proxy model which is optimized to classify audio scenes while reconstructing the audio with the nonnegative matrix factorization framework [8].…”

Section: Introductionmentioning

confidence: 99%

“…We propose to train an explainable proxy model from a pretrained multilabel segmentation model (designated as the teacher). The architecture is inspired by [7]. The proxy is trained following a teacher-student approach, commonly used in knowledge distillation [21].…”

Section: Introductionmentioning

confidence: 99%

“…The former inputs a spectrogram and the latter uses the teacher's Wavlm outputs. Contrary to [7], we consider frame-level, i.e. time segmentation, instead of utterance-level classification.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

An Explainable Proxy Model for Multilabel Audio Segmentation

Mariotte,

Almudévar,

Tahon

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

Section: Non-negative Matrix Factorization (Nmf)mentioning

confidence: 99%

Section: Proxy Model Frameworkmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

An Explainable Proxy Model for Multilabel Audio Segmentation

Mariotte,

Almudévar,

Tahon

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

show abstract

Audio Explainable Artificial Intelligence: A Review

Akman,

Schuller

2024

Intell Comput

View full text Add to dashboard Cite

Artificial intelligence (AI) capabilities have grown rapidly with the introduction of cutting-edge deep-model architectures and learning strategies. Explainable AI (XAI) methods aim to make the capabilities of AI models beyond accuracy interpretable by providing explanations. The explanations are mainly used to increase model transparency, debug the model, and justify the model predictions to the end user. Most current XAI methods focus on providing visual and textual explanations that are prone to being present in visual media. However, audio explanations are crucial because of their intuitiveness in audio-based tasks and higher expressiveness than other modalities in specific scenarios, such as when understanding visual explanations requires expertise. In this review, we provide an overview of XAI methods for audio in 2 categories: exploiting generic XAI methods to explain audio models, and XAI methods specialised for the interpretability of audio models. Additionally, we discuss certain open problems and highlight future directions for the development of XAI techniques for audio modeling.

show abstract

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Cited by 2 publications

References 44 publications

An Explainable Proxy Model for Multilabel Audio Segmentation

An Explainable Proxy Model for Multilabel Audio Segmentation

Audio Explainable Artificial Intelligence: A Review

Contact Info

Product

Resources

About