2021
DOI: 10.1017/atsip.2021.4
|View full text |Cite
|
Sign up to set email alerts
|

Audio-to-score singing transcription based on a CRNN-HSMM hybrid model

Abstract: This paper describes an automatic singing transcription (AST) method that estimates a human-readable musical score of a sung melody from an input music signal. Because of the considerable pitch and temporal variation of a singing voice, a naive cascading approach that estimates an F0 contour and quantizes it with estimated tatum times cannot avoid many pitch and rhythm errors. To solve this problem, we formulate a unified generative model of a music signal that consists of a semi-Markov language model represen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 10 publications
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…In [36], the author proposed a Bayesian hierarchical hidden semi-Markov model (HHSMM), which generates a note sequence and consists of three sub-models describing local keys, pitches, and onset score times. Later, a CRNN-HSMM hybrid model was proposed in [37], estimating the most likely notes from the music signal using the Viterbi algorithm. This method improved the performance of AST and was superior to HSMM-based method, the most advanced method at that time [36].…”
Section: Singing Transcriptionmentioning
confidence: 99%
“…In [36], the author proposed a Bayesian hierarchical hidden semi-Markov model (HHSMM), which generates a note sequence and consists of three sub-models describing local keys, pitches, and onset score times. Later, a CRNN-HSMM hybrid model was proposed in [37], estimating the most likely notes from the music signal using the Viterbi algorithm. This method improved the performance of AST and was superior to HSMM-based method, the most advanced method at that time [36].…”
Section: Singing Transcriptionmentioning
confidence: 99%
“…Multiple methods have been proposed for estimating notes from pitch posteriorgrams e.g. using median filtering [11], Hidden Markov Models [16] or neural networks [20,21]. While most approaches consider each semitone independently, some approaches attempt to model the interactions between notes, using spectral likelihood models [1,18], or music language models [3,17].…”
Section: Background and Related Workmentioning
confidence: 99%