ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054567
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?

Abstract: Background music affects lyrics intelligibility of singing vocals in a music piece. Automatic lyrics alignment and transcription in polyphonic music are challenging tasks because the singing vocals are corrupted by the background music. In this work, we propose to learn music genre-specific characteristics to train polyphonic acoustic models. We first compare several automatic speech recognition pipelines for the application of lyrics transcription. We then present the lyrics alignment and transcription perfor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
88
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 36 publications
(90 citation statements)
references
References 22 publications
2
88
0
Order By: Relevance
“…Then a forward-pass decoding algorithm is applied on these posteriograms, obtaining phoneme alignments. Then using a language model (LM), phoneme posteriograms can be converted to word posteriograms to retrieve word-level alignments [1, 3,4]. One recent successful system [2] showed a considerable performance boost compared to previous research using an end-to-end approach trained on a large corpus, where alphabetic characters are used as sub-word units of speech.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Then a forward-pass decoding algorithm is applied on these posteriograms, obtaining phoneme alignments. Then using a language model (LM), phoneme posteriograms can be converted to word posteriograms to retrieve word-level alignments [1, 3,4]. One recent successful system [2] showed a considerable performance boost compared to previous research using an end-to-end approach trained on a large corpus, where alphabetic characters are used as sub-word units of speech.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, the authors have used a public dataset [9] that is much smaller than the training set used in [2]. Gupta et al [3] reported state-of-the-art results using an acoustic model trained on polyphonic music using genre-specific phonemes. According to the authors, their system applies forced alignment with a large beam size as their system attempts to process the entire music recording at once.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations