2022
DOI: 10.5334/tismir.121
|View full text |Cite
|
Sign up to set email alerts
|

Voice Assignment in Vocal Quartets Using Deep Learning Models Based on Pitch Salience

Abstract: This paper deals with the automatic transcription of four-part, a cappella singing, audio performances. In particular, we exploit an existing, deep-learning based, multiple F0 estimation method and complement it with two neural network architectures for voice assignment (VA) in order to create a music transcription system that converts an input audio mixture into four pitch contours. To train our VA models, we create a novel synthetic dataset by collecting 5381 choral music scores from public-domain music arch… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(18 citation statements)
references
References 18 publications
(15 reference statements)
0
18
0
Order By: Relevance
“…We assume that the fundamental frequencies for each of the J sources can be obtained from the mixture signal with a multiple F0 estimation system. Given that many such systems exist [45], [18], [46] and that it is still an active research area, we are confident that this is a reasonable assumption. When all F0s are obtained, each F0 value needs to be assigned to one specific source.…”
Section: B Parameter Estimationmentioning
confidence: 90%
See 4 more Smart Citations
“…We assume that the fundamental frequencies for each of the J sources can be obtained from the mixture signal with a multiple F0 estimation system. Given that many such systems exist [45], [18], [46] and that it is still an active research area, we are confident that this is a reasonable assumption. When all F0s are obtained, each F0 value needs to be assigned to one specific source.…”
Section: B Parameter Estimationmentioning
confidence: 90%
“…Section IV-B. F0 estimates are usually provided at a frame rate which is smaller than the sample rate [45], [18], [46]. Therefore, following [17], the source specific F0 time series are upsampled to the sample rate using bilinear interpolation.…”
Section: B Parameter Estimationmentioning
confidence: 99%
See 3 more Smart Citations