2021
DOI: 10.3390/electronics10030298
|View full text |Cite
|
Sign up to set email alerts
|

Vocal Melody Extraction via HRNet-Based Singing Voice Separation and Encoder-Decoder-Based F0 Estimation

Abstract: Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty is that, most of the time, various instruments and singing voices are mixed according to harmonic structure, making it hard to identify the fundamental frequency (F0) of a singing voice. Therefore, reducing the interference of accompaniment is beneficial to pitch estimation of the singing voice. In this paper, we first adopted a high-resolution network (HRNet) to separate vocals from polyphonic musi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
2

Relationship

3
7

Authors

Journals

citations
Cited by 25 publications
(8 citation statements)
references
References 18 publications
(30 reference statements)
0
8
0
Order By: Relevance
“…Recently, a frequency-temporal attention module was introduced in [19] to learn the relevant regions for predictions. Some special representations are proposed including HCQT [7], a combination of frequency and periodicity [20], and source-separated tracks [21,22].…”
Section: Related Workmentioning
confidence: 99%
“…Recently, a frequency-temporal attention module was introduced in [19] to learn the relevant regions for predictions. Some special representations are proposed including HCQT [7], a combination of frequency and periodicity [20], and source-separated tracks [21,22].…”
Section: Related Workmentioning
confidence: 99%
“…SID is used in music library management to address the classification of songs by singers. Furthermore, the SID model is able to be used for downstream singing-related applications, such as similarity search, playlist generation, or song synthesis [4]- [9].…”
Section: Introductionmentioning
confidence: 99%
“…The task of singing voice synthesis is similar to the text-to-speech (TTS) in speech processing, and the synthesis speech is generated according to the given text. With the development of text-to-speech technology, many technologies [1]- [7] have been successfully applied to the task of singing voice synthesis. Both of the tasks of TTS and SVS encoded the lyrics or text into an acoustic variable, through a vocoder to synthesize the audio waveform.…”
Section: Introductionmentioning
confidence: 99%