2021
DOI: 10.48550/arxiv.2111.03017
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MT3: Multi-Task Multitrack Music Transcription

Abstract: Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a challenging task at the core of music understanding. Unlike Automatic Speech Recognition (ASR), which typically focuses on the words of a single speaker, AMT often requires transcribing multiple instruments simultaneously, all while preserving fine-scale pitch and timing information. Further, many AMT datasets are "low-resource", as even expert musicians find music transcription difficult and time-consuming. Thus, prior work has … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 26 publications
(58 reference statements)
0
6
0
Order By: Relevance
“…For f T , we propose a new instrument-wise metric to better capture the model performance for multi-instrument transcription. Existing literature uses mostly flat metrics or piece-wise evaluation [18,24,25,28]. Although this can provide a general idea of how good the transcription is, it does not show which musical instrument the model is particularly good or bad at.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…For f T , we propose a new instrument-wise metric to better capture the model performance for multi-instrument transcription. Existing literature uses mostly flat metrics or piece-wise evaluation [18,24,25,28]. Although this can provide a general idea of how good the transcription is, it does not show which musical instrument the model is particularly good or bad at.…”
Section: Discussionmentioning
confidence: 99%
“…Omnizart [16,29] is instrumentaware, but it does not scale up well when the number of musical instruments increases as discussed in Section 5.3. MT3 [18] is the current state-of-the-art MIAMT model. It formulates AMT as a sequence prediction task where the sequence consists of tokens of musical note representation.…”
Section: Multi-instrument Automatic Music Transcriptionmentioning
confidence: 99%
See 1 more Smart Citation
“…The grouped or separated stream typically corresponds to an individual instrument. Figure 1c shows an example of stream-level transcription which was obtained from a multi-task multitrack music transcription (MT3) model [31]. The estimated pitches and notes for each instrument in this model have been grouped into separate streams using various music transcription datasets.…”
Section: Stream-level Transcriptionmentioning
confidence: 99%
“…The power of the transformer model is currently being used in many different fields of artificial intelligence including automatic music transcription. Inspired by the successful sequence-to-sequence transfer learning in natural language processing, one of the recent works demonstrates the effectiveness of a generalpurpose transformer model in transcribing various combinations of instruments across multiple datasets [31]. Additionally, another study takes the transformer model into account for the purpose of piano transcription.…”
Section: Future Directionsmentioning
confidence: 99%