A Streamlined Encoder/decoder Architecture for Melody Extraction

Hsieh, Tsung‐Han; Su, Li; Yang, Yi-Hsuan

doi:10.1109/icassp.2019.8682389

Cited by 48 publications

(82 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compared our best melody extraction model, JDC S (o mv ), with state-of-the-art methods using deep neural networks [17,18,21,22]. For a comparison of results under the same conditions, the test sets were ADC04, MIREX05, and MedleyDB for comparing other methods as mentioned in Section 3.1.2.…”

Section: Comparison With State-of-the-art Methods For Melody Extractionmentioning

confidence: 99%

See 1 more Smart Citation

Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks

Kum

Nam

2019

Applied Sciences

View full text Add to dashboard Cite

Singing melody extraction essentially involves two tasks: one is detecting the activity of a singing voice in polyphonic music, and the other is estimating the pitch of a singing voice in the detected voiced segments. In this paper, we present a joint detection and classification (JDC) network that conducts the singing voice detection and the pitch estimation simultaneously. The JDC network is composed of the main network that predicts the pitch contours of the singing melody and an auxiliary network that facilitates the detection of the singing voice. The main network is built with a convolutional recurrent neural network with residual connections and predicts pitch labels that cover the vocal range with a high resolution, as well as non-voice status. The auxiliary network is trained to detect the singing voice using multi-level features shared from the main network. The two optimization processes are tied with a joint melody loss function. We evaluate the proposed model on multiple melody extraction and vocal detection datasets, including cross-dataset evaluation. The experiments demonstrate how the auxiliary network and the joint melody loss function improve the melody extraction performance. Furthermore, the results show that our method outperforms state-of-the-art algorithms on the datasets.

show abstract

Section: Comparison With State-of-the-art Methods For Melody Extractionmentioning

confidence: 99%

“…Researchers have attempted various deep neural network architectures for melody extraction. Examples include fully-connected neural networks (FNN) [15,16], convolutional neural networks (CNN) [17,18], recurrent neural networks (RNN) [19], convolutional recurrent neural networks (CRNN) [20], and encoder-decoder [21,22].…”

Section: Introductionmentioning

confidence: 99%

Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks

Kum

Nam

2019

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…Existing melody extraction algorithms can be roughly divided into three frameworks, i.e., pitch-salience based [2], source separation based [3,4] and data-driven based methods [5][6][7]. Source separation based methods have more potential to overcome the above difficulties and substantially foster the advances of melody extraction.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, encoder-decoder architecture has demonstrated its powerful performance for VME. Lu et al [15] adopted an encoder-decoder network with dilated convolutions and Hsieh et al [6] constructed an encoder-decoder network with pooling indices. Simulating the process of semantic segmentation, they took the combined frequency and periodicity representation as inputs and outputted a two-dimensional salience image where frequency bins with maximum values per frame were selected.…”

Section: Introductionmentioning

confidence: 99%

Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation

Gao

Zhang

2021

Electronics

View full text Add to dashboard Cite

Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty is that, most of the time, various instruments and singing voices are mixed according to harmonic structure, making it hard to identify the fundamental frequency (F0) of a singing voice. Therefore, reducing the interference of accompaniment is beneficial to pitch estimation of the singing voice. In this paper, we first adopted a high-resolution network (HRNet) to separate vocals from polyphonic music, then designed an encoder-decoder network to estimate the vocal F0 values. Experiment results demonstrate that the effectiveness of the HRNet-based singing voice separation method in reducing the interference of accompaniment on the extraction of vocal melody, and the proposed vocal melody extraction (VME) system outperforms other state-of-the-art algorithms in most cases.

show abstract

“…Lu and Su addressed the melody extraction problem from the semantic segmentation on a time-frequency image perspective [15]. Afterwards, following Lu and Su's work, Hsieh et al added links between the pooling layers of the encoder and the un-pooling layers of the decoder to reduce convolution layers and simplify convolution modules [16]. The deep learning-based methods can automatically learn high-level features, according to the training data.…”

Section: Introductionmentioning

confidence: 99%

Efficient Melody Extraction Based on Extreme Learning Machine

Zhang

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

Melody extraction is an important task in music information retrieval community and it is unresolved due to the complex nature of real-world recordings. In this paper, the melody extraction problem is addressed in the extreme learning machine (ELM) framework. More specifically, the input musical signal is first pre-processed to mimic the human auditory system. The music features are then constructed by constant-Q transform (CQT), and the concentration strategy is introduced to make use of contextual information. Afterwards, the rough melody pitches are determined by ELM network, according to its pre-trained parameters. Finally, the rough melody pitches are fine-tuned by the spectral peaks around the frame-wise rough pitches. The proposed method can extract melody from polyphonic music efficiently and effectively, where pitch estimation and voicing detection are conducted jointly. Some experiments have been conducted based on three publicly available datasets. The experimental results reveal that the proposed method achieves higher overall accuracies with very fast speed.

show abstract

A Streamlined Encoder/decoder Architecture for Melody Extraction

Cited by 48 publications

References 16 publications

Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks

Joint Detection and Classification of Singing Voice Melody Using Convolutional Recurrent Neural Networks

Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation

Efficient Melody Extraction Based on Extreme Learning Machine

Contact Info

Product

Resources

About