2022
DOI: 10.3389/frsip.2022.808594
|View full text |Cite
|
Sign up to set email alerts
|

A Deep-Learning Based Framework for Source Separation, Analysis, and Synthesis of Choral Ensembles

Abstract: Choral singing in the soprano, alto, tenor and bass (SATB) format is a widely practiced and studied art form with significant cultural importance. Despite the popularity of the choral setting, it has received little attention in the field of Music Information Retrieval. However, the recent publication of high-quality choral singing datasets as well as recent developments in deep learning based methodologies applied to the field of music and speech processing, have opened new avenues for research in this field.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
2
1

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 32 publications
(41 reference statements)
0
5
0
Order By: Relevance
“…Homogeneous audio sources are not easily distinguishable in the time-frequency domain and pose a permutation problem [20], [21]. While permutation-invariant training is used for supervised speech separation [21], [22], methods for musical homogeneous source separation exploit side-information such as F0 estimates [23], [11] or a musical score [24], [9], [10] to guide the separation.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Homogeneous audio sources are not easily distinguishable in the time-frequency domain and pose a permutation problem [20], [21]. While permutation-invariant training is used for supervised speech separation [21], [22], methods for musical homogeneous source separation exploit side-information such as F0 estimates [23], [11] or a musical score [24], [9], [10] to guide the separation.…”
Section: Related Workmentioning
confidence: 99%
“…In this context, a choir is composed of four homogeneous sources: a soprano, alto, tenor, and a bass singer. Petermann et al [23] modified the conditioned U-Net [25] so that the target source can be selected and separated using its F0 information. Results show that this leads to improved objective separation quality compared to using non-informed source-specific models.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Our models consistently outperform Petermann et al's[21] U-Net model in the SI-SDR metric. The best results were achieved Evaluation of proposed approaches on CSD (Test dataset) using source separation and pitch accuracy metrics, for the BC1Song (a) and BCBSQ (b) training datasets.…”
mentioning
confidence: 57%