2017
DOI: 10.1109/access.2017.2738558
|View full text |Cite
|
Sign up to set email alerts
|

Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

Abstract: Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weight… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 27 publications
(61 reference statements)
0
4
0
Order By: Relevance
“…How do similarity models perform on multilingual data? To estimate the multilingual performance of published models trained on English data, we apply the NUS AutoLyricsAlign software 5 on the JamendoLyrics Multi-Lang dataset, and evaluate the alignment 6 . We find that the performance on English matches what was published by the authors [6].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…How do similarity models perform on multilingual data? To estimate the multilingual performance of published models trained on English data, we apply the NUS AutoLyricsAlign software 5 on the JamendoLyrics Multi-Lang dataset, and evaluate the alignment 6 . We find that the performance on English matches what was published by the authors [6].…”
Section: Resultsmentioning
confidence: 99%
“…For many years, the accuracy of systems for the automatic alignment of audio and lyrics content was well below the requirements for practical applications [1][2][3][4][5]. However, since 2019 there has been a resurgence of research and results improved by an order of magnitude.…”
Section: Introductionmentioning
confidence: 99%
“…Chien et al [19] introduced an approach based on vowel likelihood models. Chang and Lee [20] used canonical time warping and repetitive vowel patterns to find the alignment for vowel sequence. Some other works achieved the alignment at music structure-level [21] or line-level [22].…”
Section: Related Workmentioning
confidence: 99%
“…Others focus on the alignments of vowels. For instance, the work in [25] combines non-negative matrix factorization and canonical time warping to discover repetitive acoustic patterns of vowels. In [26] the authors proposed to train vowel likelihood models to identify vowels and then to align syllables.…”
Section: Introductionmentioning
confidence: 99%