Sanjeel Parekh scite author profile

In this paper we tackle the problem of single channel audio source separation driven by descriptors of the sounding object's motion. As opposed to previous approaches, motion is included as a softcoupling constraint within the nonnegative matrix factorization framework. The proposed method is applied to a multimodal dataset of instruments in string quartet performance recordings where bow motion information is used for separation of string instruments. We show that the approach offers better source separation result than an audio-based baseline and the state-of-the-art multimodal-based approaches on these very challenging music mixtures.

show abstract

Weakly Supervised Representation Learning for Audio-Visual Scene Analysis

Parekh

Essid

Ozerov³

et al. 2020

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance learning. We show that the learnt representations are useful for classifying events and localizing their characteristic audio-visual elements. The system is trained using only videolevel event labels without any timing information. An important feature of our method is its capacity to learn from unsynchronized audio-visual events. We achieve state-of-the-art results on a large-scale dataset of weakly-labeled audio event videos. Visualizations of localized visual regions and audio segments substantiate our system's efficacy, especially when dealing with noisy situations where modality-specific cues appear asynchronously.

show abstract

Guiding audio source separation by video object information

Parekh¹,

Essid²,

Ozerov³

et al. 2017

View full text Add to dashboard Cite

Identify, Locate and Separate: Audio-Visual Object Extraction in Large Video Collections Using Weak Supervision

Parekh

Ozerov²,

Essid

et al. 2019

View full text Add to dashboard Cite

We tackle the problem of audio-visual scene analysis for weaklylabeled data. To this end, we build upon our previous audio-visual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results.

show abstract

Deep Pairwise Classification and Ranking for Predicting Media Interestingness

Parekh

Tibrewal

Parekh

2018

View full text Add to dashboard Cite

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Parekh¹,

Parekh²,

Mozharovskyi³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper tackles post-hoc interpretability for audio processing networks. Our goal is to interpret decisions of a network in terms of high-level audio objects that are also listenable for the end-user. To this end, we propose a novel interpreter design that incorporates non-negative matrix factorization (NMF). In particular, a carefully regularized interpreter module is trained to take hidden layer representations of the targeted network as input and produce time activations of pre-learnt NMF components as intermediate outputs. Our methodology allows us to generate intuitive audio-based interpretations that explicitly enhance parts of the input signal most relevant for a network’s decision. We demonstrate our method’s applicability on popular benchmarks, including a real-world multi-label classification task.

show abstract

Multiview Approaches to Event Detection and Scene Analysis

Essid

Parekh

Duong

et al. 2017

View full text Add to dashboard Cite

Nyquist Filter Design using POCS Methods: Including Constraints in Design

Parekh¹,

Shah²

2013

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sanjeel Parekh

Motion informed audio source separation

Weakly Supervised Representation Learning for Audio-Visual Scene Analysis

Guiding audio source separation by video object information

Identify, Locate and Separate: Audio-Visual Object Extraction in Large Video Collections Using Weak Supervision

Deep Pairwise Classification and Ranking for Predicting Media Interestingness

Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF

Multiview Approaches to Event Detection and Scene Analysis

Nyquist Filter Design using POCS Methods: Including Constraints in Design

Contact Info

Product

Resources

About