Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

Cheuk, Kin Wai; Choi, Keunwoo; Kong, Qiuqiang; Li, Bochen; Won, Minz; Hung, Amy; Wang, Ju-Chiang; Herremans, Dorien

doi:10.48550/arxiv.2206.10805

Cited by 2 publications

(4 citation statements)

References 47 publications

(71 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The objective is to separate and extract the individual musical lines of each instrument, capturing their respective pitches, timings, and contributions to the overall musical texture. Recently, deep learning-based multiinstrument transcription [73][74][75][76][77] has been widely used and is crucial for analyzing and understanding the interactions between different instruments in a polyphonic musical piece. A recently published paper [73] jointly considers the instrument recognition module, the transcription module, and the source separation module, which is capable of transcribing, recognizing, and separating multiple musical instruments from the audio signal.…”

Section: Multi-instrument Transcriptionmentioning

confidence: 99%

“…Recently, deep learning-based multiinstrument transcription [73][74][75][76][77] has been widely used and is crucial for analyzing and understanding the interactions between different instruments in a polyphonic musical piece. A recently published paper [73] jointly considers the instrument recognition module, the transcription module, and the source separation module, which is capable of transcribing, recognizing, and separating multiple musical instruments from the audio signal. Similarly, the work in [74] adapts the concept of computer vision methods like multi-object detection and instance segmentation for multi-instrument note tracking.…”

Section: Multi-instrument Transcriptionmentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Review on Music Transcription

Bhattarai,

Lee

2023

Applied Sciences

View full text Add to dashboard Cite

Music transcription is the process of transforming recorded sound of musical performances into symbolic representations such as sheet music or MIDI files. Extensive research and development have been carried out in the field of music transcription and technology. This comprehensive review paper surveys the diverse methodologies, techniques, and advancements that have shaped the landscape of music transcription. The paper outlines the significance of music transcription in preserving, analyzing, and disseminating musical compositions across various genres and cultures. It also provides a historical perspective by tracing the evolution of music transcription from traditional manual methods to modern automated approaches. It also highlights the challenges in transcription posed by complex singing techniques, variations in instrumentation, ambiguity in pitch, tempo changes, rhythm, and dynamics. The review also categorizes four different types of transcription techniques, frame-level, note-level, stream-level, and notation-level, discussing their strengths and limitations. It also encompasses the various research domains of music transcription from general melody extraction to vocal melody, note-level monophonic to polyphonic vocal transcription, single-instrument to multi-instrument transcription, and multi-pitch estimation. The survey further covers a broad spectrum of music transcription applications in music production and creation. It also reviews state-of-the-art open-source as well as commercial music transcription tools for pitch estimation, onset and offset detection, general melody detection, and vocal melody detection. In addition, it also encompasses the currently available python libraries that can be used for music transcription. Furthermore, the review highlights the various open-source benchmark datasets for different areas of music transcription. It also provides a wide range of references supporting the historical context, theoretical frameworks, and foundational concepts to help readers understand the background of music transcription and the context of our paper.

show abstract

Section: Multi-instrument Transcriptionmentioning

confidence: 99%

Section: Multi-instrument Transcriptionmentioning

confidence: 99%

A Comprehensive Review on Music Transcription

Bhattarai,

Lee

2023

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…[17] MIT 128 SelfAtt Self-supervised In-house dataset, Cerberus4 1 , etc. 2022 Cheuk et al [18] MIT+SS 128 CRNN Supervised Slakh 1…”

Section: Related Workmentioning

confidence: 99%

“…For example, Manilow et al trained a model on both MIT and audio source separation (SS) and found that it performed better on both tasks than the respective single-task models [13]. Cheuk et al used a similar approach and also showed that "jointly trained music transcription and music source separation models are beneficial to each other" [18]. Conversely, Cartwright et al performed both DTM and beat detection on datasets suited for only one of these tasks in order to expand the total amount of training data [9].…”

Section: Tasksand Vocabularymentioning

confidence: 99%

High-Quality and Reproducible Automatic Drum Transcription from Crowdsourced Data

Zehren,

Alunno,

Bientinesi

2023

Signals

View full text Add to dashboard Cite

Within the broad problem known as automatic music transcription, we considered the specific task of automatic drum transcription (ADT). This is a complex task that has recently shown significant advances thanks to deep learning (DL) techniques. Most notably, massive amounts of labeled data obtained from crowds of annotators have made it possible to implement large-scale supervised learning architectures for ADT. In this study, we explored the untapped potential of these new datasets by addressing three key points: First, we reviewed recent trends in DL architectures and focused on two techniques, self-attention mechanisms and tatum-synchronous convolutions. Then, to mitigate the noise and bias that are inherent in crowdsourced data, we extended the training data with additional annotations. Finally, to quantify the potential of the data, we compared many training scenarios by combining up to six different datasets, including zero-shot evaluations. Our findings revealed that crowdsourced datasets outperform previously utilized datasets, and regardless of the DL architecture employed, they are sufficient in size and quality to train accurate models. By fully exploiting this data source, our models produced high-quality drum transcriptions, achieving state-of-the-art results. Thanks to this accuracy, our work can be more successfully used by musicians (e.g., to learn new musical pieces by reading, or to convert their performances to MIDI) and researchers in music information retrieval (e.g., to retrieve information from the notes instead of audio, such as the rhythm or structure of a piece).

show abstract

Jointist: Joint Learning for Multi-instrument Transcription and Its Applications

Cited by 2 publications

References 47 publications

A Comprehensive Review on Music Transcription

A Comprehensive Review on Music Transcription

High-Quality and Reproducible Automatic Drum Transcription from Crowdsourced Data

Contact Info

Product

Resources

About