Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models

Sharma, Bidisha; Gupta, Chitralekha; Li, Haizhou; Wang, Ye

doi:10.1109/icassp.2019.8682582

Cited by 26 publications

(27 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…With an acoustic model trained on solo-singing data, we can adapt the model towards the test data in two ways: (a) by making the test data closer to the trained solo-singing acoustic models by applying vocal separation on polyphonic test data, (b) by adapting the acoustic models to polyphonic data. In [11], the former approach was explored. But source separation algorithms are known to introduce artifacts in the extracted vocal, thus the pipeline gets dependent on the reliability of the source separation algorithm.…”

Section: Model Adaptation For Domain Mismatchmentioning

confidence: 99%

“…To reduce the domain mismatch between solo-singing acoustic models and the polyphonic test data, we adopt three approaches: (a) vocal extraction of the polyphonic test data, as done in previous studies [4,7,11], (b) adapt the models with vocal extracted polyphonic data, and (c) adapt the models with polyphonic data. We used DALI-train for adaptation, and DALI-dev to optimize the alignment performance (mean AE) by adjusting the initial learning rate (LR) and the number of epochs, as shown in Table 5.…”

Section: Performance On Polyphonic Audiomentioning

confidence: 99%

See 1 more Smart Citation

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Gupta

Yılmaz

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Automatic lyrics to polyphonic audio alignment is a challenging task not only because the vocals are corrupted by background music, but also there is a lack of annotated polyphonic corpus for effective acoustic modeling. In this work, we propose (1) using additional speech and music-informed features and (2) adapting the acoustic models trained on a large amount of solo singing vocals towards polyphonic music using a small amount of in-domain data. Incorporating additional information such as voicing and auditory features together with conventional acoustic features aims to bring robustness against the increased spectro-temporal variations in singing vocals. By adapting the acoustic model using a small amount of polyphonic audio data, we reduce the domain mismatch between training and testing data. We perform several alignment experiments and present an in-depth alignment error analysis on acoustic features, and model adaptation techniques. The results demonstrate that the proposed strategy provides a significant error reduction of word boundary alignment over comparable existing systems, especially on more challenging polyphonic data with long-duration musical interludes.

show abstract

Section: Model Adaptation For Domain Mismatchmentioning

confidence: 99%

Section: Performance On Polyphonic Audiomentioning

confidence: 99%

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Gupta

Yılmaz

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the performance achieved with the CNN based method is much poorer for singer identification as observed. The work done in [34] on singing-to-lyrics alignment showed that the effectiveness of CNN based method is closer to Wave-U-Net based approach, than the harmonic/percussive audio-source separation. This shows that the CNN based method can be effective for audio-source separation in tasks, where there is no importance of speaker information.…”

Section: Comparison To Various Audio-source Separation Methods For Prmentioning

confidence: 99%

On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music

Sharma

Das

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Singer identification is to automatically identify the singer in a music recording, such as a polyphonic song. A song has two major acoustic components that are singing vocals and background accompaniment. Although identifying singers is similar to speaker identification, it is challenging due to the interference of background accompaniment on the singer-specific information in singing vocals. We believe that separating the background accompaniment from the singing vocal will help us to overcome the interference. In this work, we extract the singing vocals from polyphonic songs using Wave-U-Net based audio-source separation approach. The extracted singing vocals are then used in i-vector based singer identification system. Further, we explore different state-of-the-art audio-source separation methods to establish the role of considered method in application to singer identification. The proposed singer identification framework achieves an absolute accuracy improvement of 5.66% over the baseline without audio-source separation.

show abstract

“…Audio source separation has been intensively studied and widely used for downstream tasks. For instance, various music information retrieval tasks, including lyric recognition and alignment [1][2][3], music transcription [4,5], instrument classification [6], and singing voice generation [7], rely on music source separation (MSS). Likewise, automatic speech recognition benefits from speech enhancement and speech separation.…”

Section: Introductionmentioning

confidence: 99%

Adversarial Attacks on Audio Source Separation

Inoue

Mitsufuji

2021

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Despite the excellent performance of neural-network-based audio source separation methods and their wide range of applications, their robustness against intentional attacks has been largely neglected. In this work, we reformulate various adversarial attack methods for the audio source separation problem and intensively investigate them under different attack conditions and target models. We further propose a simple yet effective regularization method to obtain imperceptible adversarial noise while maximizing the impact on separation quality with low computational complexity. Experimental results show that it is possible to largely degrade the separation quality by adding imperceptibly small noise when the noise is crafted for the target model. We also show the robustness of source separation models against a black-box attack. This study provides potentially useful insights for developing content protection methods against the abuse of separated signals and improving the separation performance and robustness.

show abstract

Automatic Lyrics-to-audio Alignment on Polyphonic Music Using Singing-adapted Acoustic Models

Cited by 26 publications

References 11 publications

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

On the Importance of Audio-Source Separation for Singer Identification in Polyphonic Music

Adversarial Attacks on Audio Source Separation

Contact Info

Product

Resources

About