Supervised Metric Learning for Music Structure Features

Wang, Ju-Chiang; Smith, Jordan B. L.; Lu, Wei-Tsung; Song, Xuchen

doi:10.48550/arxiv.2110.09000

Cited by 1 publication

(3 citation statements)

References 18 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Segmentation is usually based on criteria such as homogeneity, novelty, repetition and regularity [1]. When performed algorithmically, MSA often relies on similarity criteria within passages of a song summarized in an autosimilarity matrix [2][3][4][5][6][7][8], in which each coefficient represents an estimation of the similarity between two musical fragments.…”

Section: Introductionmentioning

confidence: 99%

“…While similarity between two frames can be obtained from the feature representation of the signal, such as the STFT of the song [2], recent works try to design new representations of the original music, able to capture the similarity between two frames while maintaining a high level of dissimilarity between dissimilar frames [1,[3][4][5][6][7][8]. This generally consists in projecting the data in a new feature space and computing the similarity in the feature space.…”

Section: Introductionmentioning

confidence: 99%

“…Instead, the present work considers that repetitions are more prone to happen at the barscale, and hence focuses on barwise aligned features, as in [7]. A comparison between both has been made in [8], without one singling out.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Barwise Compression Schemes for Audio-Based Music Structure Analysis

Marmoret¹,

Cohen²,

Bimbot³

2022

Preprint

View full text Add to dashboard Cite

Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song.More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, linear and non-linear compression schemes can be applied to barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm.This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description.In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of stateof-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.

show abstract