2020
DOI: 10.1109/taslp.2020.2982285
|View full text |Cite
|
Sign up to set email alerts
|

SPICE: Self-Supervised Pitch Estimation

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
46
1
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 52 publications
(50 citation statements)
references
References 27 publications
2
46
1
1
Order By: Relevance
“…Before describing the proposed method in the next section, we here explain the backgrounds by reviewing previous studies. Input signal representations have been studied for music information processing, including the short-time Fourier transform (STFT) [17, 18], the constant-Q transform (CQT) [6], and the log Mel-scale filter-bank [19]. Recently, the harmonic CQT (HCQT) representation, which is obtained by stacking pitch-shifted (upshifted and downshifted) CQT spectrograms, has been proposed [3].…”
Section: Backgroundsmentioning
confidence: 99%
See 1 more Smart Citation
“…Before describing the proposed method in the next section, we here explain the backgrounds by reviewing previous studies. Input signal representations have been studied for music information processing, including the short-time Fourier transform (STFT) [17, 18], the constant-Q transform (CQT) [6], and the log Mel-scale filter-bank [19]. Recently, the harmonic CQT (HCQT) representation, which is obtained by stacking pitch-shifted (upshifted and downshifted) CQT spectrograms, has been proposed [3].…”
Section: Backgroundsmentioning
confidence: 99%
“…To estimate the semitone-level pitches and tatum-level onset and offset times of musical notes from music signals, one may estimate a singing F0 trajectory [3][4][5][6] and then quantize it on the semitone and tatum grids obtained by a beat-tracking method [7], where the tatum (e.g. 16thnote level) refers to the smallest meaningful subdivision of the main beat (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Robot [136,[174][175][176][177][178] Computer vision [135,136,[178][179][180][181] Natural language processing [182,183] Reinforcement…”
Section: Automatic Generation Of Label Datamentioning
confidence: 99%
“…Fundamental frequency (F0) estimates often serve as mid-level representation [1] in music information retrieval (MIR) tasks such as automatic music transcription [2] and performance analysis [3,4]. There exist a variety of approaches for monophonic F0-estimation, ranging from model-based methods [5][6][7] to more recent deeplearning-based methods [8,9]. A monophonic F0-estimation algorithm typically outputs one F0-value per time instance together with a confidence value that indicates the algorithm's certainty whether the sound source is active or not (sometimes referred to as "voicing").…”
Section: Introductionmentioning
confidence: 99%