Minz Won scite author profile

Minz Won

5Publications

73Citation Statements Received

52Citation Statements Given

How they've been cited

How they cite others

Affiliations

Pompeu Fabra University

Publications

Order By: Most citations

Data-Driven Harmonic Filters for Audio Representation Learning

Won

Chun

Nieto

et al. 2020

View full text Add to dashboard Cite

We introduce a trainable front-end module for audio representation learning that exploits the inherent harmonic structure of audio signals. The proposed architecture, composed of a set of filters, compels the subsequent network to capture harmonic relations while preserving spectro-temporal locality. Since the harmonic structure is known to have a key role in human auditory perception, one can expect these harmonic filters to yield more efficient audio representations. Experimental results show that a simple convolutional neural network back-end with the proposed front-end outperforms state-of-the-art baseline methods in automatic music tagging, keyword spotting, and sound event tagging tasks.

show abstract

Toward Interpretable Music Tagging with Self-Attention

Won¹,

Chun²,

Serra³

2019

Preprint

View full text Add to dashboard Cite

Toward Universal Text-To-Music Retrieval

Doh

Won²,

Choi³

et al. 2023

View full text Add to dashboard Cite

Multimodal Metric Learning for Tag-Based Music Retrieval

Won

Oramas

Nieto

et al. 2021

View full text Add to dashboard Cite

Tag-based music retrieval is crucial to browse large-scale music libraries efficiently. Hence, automatic music tagging has been actively explored, mostly as a classification task, which has an inherent limitation: a fixed vocabulary. On the other hand, metric learning enables flexible vocabularies by using pretrained word embeddings as side information. Also, metric learning has proven its suitability for cross-modal retrieval tasks in other domains (e.g., text-to-image) by jointly learning a multimodal embedding space. In this paper, we investigate three ideas to successfully introduce multimodal metric learning for tag-based music retrieval: elaborate triplet sampling, acoustic and cultural music information, and domain-specific word embeddings. Our experimental results show that the proposed ideas enhance the retrieval system quantitatively and qualitatively. Furthermore, we release the MSD500: a subset of the Million Song Dataset (MSD) containing 500 cleaned tags, 7 manually annotated tag categories, and user taste profiles.

show abstract

Modeling Beats and Downbeats with a Time-Frequency Transformer

Hung

Wang²,

Song³

et al. 2022

View full text Add to dashboard Cite

Transformer is a successful deep neural network (DNN) architecture that has shown its versatility not only in natural language processing but also in music information retrieval (MIR). In this paper, we present a novel Transformer-based approach to tackle beat and downbeat tracking. This approach employs SpecTNT (Spectral-Temporal Transformer in Transformer), a variant of Transformer that models both spectral and temporal dimensions of a time-frequency input of music audio. A SpecTNT model uses a stack of blocks, where each consists of two levels of Transformer encoders. The lower-level (or spectral) encoder handles the spectral features and enables the model to pay attention to harmonic components of each frame. Since downbeats indicate bar boundaries and are often accompanied by harmonic changes, this step may help downbeat modeling. The upper-level (or temporal) encoder aggregates useful local spectral information to pay attention to beat/downbeat positions. We also propose an architecture that combines SpecTNT with a state-ofthe-art model, Temporal Convolutional Networks (TCN), to further improve the performance. Extensive experiments demonstrate that our approach can significantly outperform TCN in downbeat tracking while maintaining comparable result in beat tracking.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Minz Won

Data-Driven Harmonic Filters for Audio Representation Learning

Toward Interpretable Music Tagging with Self-Attention

Toward Universal Text-To-Music Retrieval

Multimodal Metric Learning for Tag-Based Music Retrieval

Modeling Beats and Downbeats with a Time-Frequency Transformer

Contact Info

Product

Resources

About