audioLIME: Listenable Explanations Using Source Separation

Haunschmid, Verena; Manilow, Ethan; Widmer, Gerhard

doi:10.48550/arxiv.2008.00582

Cited by 4 publications

(6 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The input is perturbed by switching "on/off" the individual segments. AudioLIME (Haunschmid et al, 2020;Chowdhury et al, 2021) proposes to separate input using predefined sources to create the simplified representation. AudioLIME arguably generates more meaningful interpretations than SLIME as it relies on audio objects readily listenable for end-user.…”

Section: Interpretability Methods For Audiomentioning

confidence: 99%

See 1 more Smart Citation

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Parekh¹,

Parekh²,

Mozharovskyi³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs.Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network's decision. We demonstrate our method's applicability on popular benchmarks, including a real-world multi-label classification task.

show abstract

Section: Interpretability Methods For Audiomentioning

confidence: 99%

“…Particularly, APNet (Zinemanas et al, 2021) is not designed for post-hoc interpretations. AudioLIME (Haunschmid et al, 2020) is not applicable on our tasks as it requires known predefined audio sources. Moreover, SLIME (Mishra et al, 2020) and AudioLIME still rely on LIME (Ribeiro et al, 2016) for interpretations.…”

Section: Evaluation Metrics and Baselinesmentioning

confidence: 99%

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Parekh¹,

Parekh²,

Mozharovskyi³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The input is perturbed by switching "on/off" the individual segments. AudioLIME [32], [33] proposed to separate the input using predefined sources to create the simplified representation. AudioLIME arguably generates more meaningful interpretations than SLIME as it relies on audio objects readily listenable for end-users.…”

Section: B Audio Interpretabilitymentioning

confidence: 99%

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Parekh¹,

Parekh²,

Mozharovskyi³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network’s decision. We demonstrate our method’s applicability on popular benchmarks, including a real-world multi-label classification task.

show abstract

“…In Raj et al (2019), probing on x-vectors trained solely to predict the speaker label, revealed they also contain incidental information about the transcription, channel, or meta-information about the utterance. Probing the Music Information Retrieval (MIR) prediction through Local Interpretable Model-Agnostic Explanations (LIME) by using Audi-oLIME (Haunschmid et al, 2020) helped interpret MIR for the first time. They demonstrated that the proposed AudioLIME produces listenable explanations that creates trustworthy predictions for music tagging systems.…”

Section: Audio Probingmentioning

confidence: 99%

What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

Shah¹,

Kumar²,

Chen³

et al. 2021

Preprint

View full text Add to dashboard Cite

In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the standard methodology is to choose the last layer embeddings for any downstream task, but is it the optimal choice? We try to answer these questions for the two recent audio transformer models, Mockingjay and wave2vec2.0 . We compare them on a comprehensive set of language delivery and structure features including audio, fluency and pronunciation features. Additionally, we probe the audio models' understanding of textual surface, syntax, and semantic features and compare them to BERT. We do this over exhaustive settings for native, nonnative, synthetic, read and spontaneous speech datasets.

show abstract

audioLIME: Listenable Explanations Using Source Separation

Cited by 4 publications

References 2 publications

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

Contact Info

Product

Resources

About