ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019
DOI: 10.1109/icassp.2019.8682582
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
8

Relationship

2
6

Authors

Journals

citations
Cited by 26 publications
(27 citation statements)
references
References 11 publications
0
27
0
Order By: Relevance
“…With an acoustic model trained on solo-singing data, we can adapt the model towards the test data in two ways: (a) by making the test data closer to the trained solo-singing acoustic models by applying vocal separation on polyphonic test data, (b) by adapting the acoustic models to polyphonic data. In [11], the former approach was explored. But source separation algorithms are known to introduce artifacts in the extracted vocal, thus the pipeline gets dependent on the reliability of the source separation algorithm.…”
Section: Model Adaptation For Domain Mismatchmentioning
confidence: 99%
See 1 more Smart Citation
“…With an acoustic model trained on solo-singing data, we can adapt the model towards the test data in two ways: (a) by making the test data closer to the trained solo-singing acoustic models by applying vocal separation on polyphonic test data, (b) by adapting the acoustic models to polyphonic data. In [11], the former approach was explored. But source separation algorithms are known to introduce artifacts in the extracted vocal, thus the pipeline gets dependent on the reliability of the source separation algorithm.…”
Section: Model Adaptation For Domain Mismatchmentioning
confidence: 99%
“…To reduce the domain mismatch between solo-singing acoustic models and the polyphonic test data, we adopt three approaches: (a) vocal extraction of the polyphonic test data, as done in previous studies [4,7,11], (b) adapt the models with vocal extracted polyphonic data, and (c) adapt the models with polyphonic data. We used DALI-train for adaptation, and DALI-dev to optimize the alignment performance (mean AE) by adjusting the initial learning rate (LR) and the number of epochs, as shown in Table 5.…”
Section: Performance On Polyphonic Audiomentioning
confidence: 99%
“…However, the performance achieved with the CNN based method is much poorer for singer identification as observed. The work done in [34] on singing-to-lyrics alignment showed that the effectiveness of CNN based method is closer to Wave-U-Net based approach, than the harmonic/percussive audio-source separation. This shows that the CNN based method can be effective for audio-source separation in tasks, where there is no importance of speaker information.…”
Section: Comparison To Various Audio-source Separation Methods For Prmentioning
confidence: 99%
“…Audio source separation has been intensively studied and widely used for downstream tasks. For instance, various music information retrieval tasks, including lyric recognition and alignment [1][2][3], music transcription [4,5], instrument classification [6], and singing voice generation [7], rely on music source separation (MSS). Likewise, automatic speech recognition benefits from speech enhancement and speech separation.…”
Section: Introductionmentioning
confidence: 99%