2021
DOI: 10.1109/taslp.2021.3082331
|View full text |Cite
|
Sign up to set email alerts
|

Conditioned Source Separation for Musical Instrument Performances

Abstract: In music source separation, the number of sources may vary for each piece and some of the sources may belong to the same family of instruments, thus sharing timbral characteristics and making the sources more correlated. This leads to additional challenges in the source separation problem. This paper proposes a source separation method for multiple musical instruments sounding simultaneously and explores how much additional information apart from the audio stream can lift the quality of source separation. We e… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3
1

Relationship

2
7

Authors

Journals

citations
Cited by 23 publications
(13 citation statements)
references
References 52 publications
(139 reference statements)
0
13
0
Order By: Relevance
“…Recently, there has been increased interest in TSE applications to speech [17], [38]- [41], music [15], [16], [18], [19], [42]- [46], and universal sounds [2], [11]- [14], [47], [48]. Various types of auxiliary clues have been proposed to identify the target in a sound mixture, including enrollment audio samples [12], [18], [19], [38], [39], [47], class labels [2], [11], [45], video signals of the target source [15], [42], [48], [49], and recently even onomatopoeia [14].…”
Section: B Target Sound Extractionmentioning
confidence: 99%
“…Recently, there has been increased interest in TSE applications to speech [17], [38]- [41], music [15], [16], [18], [19], [42]- [46], and universal sounds [2], [11]- [14], [47], [48]. Various types of auxiliary clues have been proposed to identify the target in a sound mixture, including enrollment audio samples [12], [18], [19], [38], [39], [47], class labels [2], [11], [45], video signals of the target source [15], [42], [48], [49], and recently even onomatopoeia [14].…”
Section: B Target Sound Extractionmentioning
confidence: 99%
“…A lot of previous work exists for specific domains, such as separating speech from either other speech sources [12,13] or separating speech from other sounds [14,15]. Besides speech, there also has been much work towards extracting individual instrument tracks from music [16,17,2]. Another related line of research performs audio source separation with the help of with other modalities, such as vision [3,18] or accelerometers [19].…”
Section: Related Workmentioning
confidence: 99%
“…There are various ways in which this task could be approached. Given the recent success of deep learning for audio processing [1,2,3,4], we propose to train a neural network for this task. A key aspect of our proposal is not to tie the sound to be extracted to any predefined collection of sound categories (such as, for example, the ontology defined by AudioSet [5]).…”
Section: Introductionmentioning
confidence: 99%
“…In our system, there is only a single ACCDOA vector output at each time instant, and the class represented by this output is determined based on an input describing the type of sound event we want to locate. Such conditioning-based approaches are often used in audio source separation [8] to isolate the sound from a specific musical instrument [9,10], speaker [11,12], or sound event [13]. Compared to a system that outputs all classes simultaneously, conditioning-based approaches require multiple passes at inference-time (one per class) in order to detect all classes, similarly to the class-specific models, although they do not require training multiple separate models.…”
Section: Introductionmentioning
confidence: 99%