2021
DOI: 10.1016/j.specom.2021.07.002
|View full text |Cite
|
Sign up to set email alerts
|

NHSS: A speech and singing parallel database

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
9
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

2
6

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 41 publications
1
9
0
Order By: Relevance
“…We selected these features by reviewing what past studies focused on for the analysis of song-speech comparison and prominently observed features in music (e.g. Fitch, 2006;Hansen et al, 2020;Hilton et al, 2022;Savage et al, 2015;Sharma et al, 2021, see the Supplementary Discussion section S1.1 for a more comprehensive literature review). Here, f 0 , rate of change of f 0 , and spectral centroid are extracted purely from acoustic signals, while IOI rate is based purely on manual annotations.…”
Section: Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…We selected these features by reviewing what past studies focused on for the analysis of song-speech comparison and prominently observed features in music (e.g. Fitch, 2006;Hansen et al, 2020;Hilton et al, 2022;Savage et al, 2015;Sharma et al, 2021, see the Supplementary Discussion section S1.1 for a more comprehensive literature review). Here, f 0 , rate of change of f 0 , and spectral centroid are extracted purely from acoustic signals, while IOI rate is based purely on manual annotations.…”
Section: Featuresmentioning
confidence: 99%
“…(emphasis added) Importantly, however, Savage et al's conclusion was based only on an analysis of music, thus the contrast with speech is speculative and not based on comparative data. Some studies have identified differences between speech and song in specific languages, such as song being slower and higher-pitched (Hansen et al, 2020;Merrill & Larrouy-Maestri, 2017;Sharma et al, 2021;Vanden Bosch der Nederlanden et al, 2022). However, a lack of annotated cross-cultural recordings of matched speaking and singing has hampered attempts to establish cross-cultural relationships between speech and song (cf.…”
Section: Introductionmentioning
confidence: 99%
“…Due to music copyright restrictions, one of the main hindrances for research in singing voice has been the lack of appropriately annotated publicly available datasets. In recent years, companies such as Smule Inc. and KaraFun have contributed datasets to the research community, and researchers have come together to prepare annotated datasets for the community such as NHSS [126], DALI [228], and NUS48E [97]. These datasets are used or can potentially be used for multiple tasks.…”
Section: Datasets For Singing Voice Researchmentioning
confidence: 99%
“…Lyrics transcription in solo singing [209], [211], singer identification and query by singing [232], singing style and intonation pattern analysis [60], [233] DSing (DAMP Sing! Lyrics Curated) [209] 150 hours curated English songs data from the DAMP dataset; removed noisy data https://github.com/groadabike/ Kaldi-Dsing-task Lyrics transcription in solo singing [209], [211], [215] DAMP-VSEP [234] 11,494 compositions (155 countries, 36 languages, 6456 artists) with backing tracks, one or more isolated vocals, and a mixture of the two https://zenodo.org/record/3553059 Singing voice separation [173] DAMP Aligned [203] 50 hours training data, 2.3 hours test; lyrics aligned and short segments https://github.com/chitralekha18/ lyrics-aligned-solo-singing-dataset Lyrics transcription in solo singing [203], [226], [224] DALI [228] 134 hours English polyphonic song utterances with aligned lyrics https://github.com/gabolsgabs/DALI Lyrics transcription in polyphonic music NUS48E [97] 2.8 hours recordings of the sung and spoken lyrics of 48 (20 unique) English songs by 12 subjects and transcriptions and duration annotations at the phonelevel https://smcnus.comp.nus.edu.sg/ nus-48e-sung-and-spoken-lyrics-corpus/ Speech-singing conversion [235], singing synthesis, pronunciation evaluation [236], phoneme alignment in solo singing NHSS [126] 100 songs sung and spoken by 10 singers, resulting in total of 7 hours audio data https://hltnus.github.io/NHSSDatabase/ index.html Speech-singing conversion, singing synthesis, lyrics alignment in solo singing NUS48E+ SingEval [43] 2 songs, 20 singers; music experts labels on pitch, rhythm, etc. https://github.com/chitralekha18/ PESnQ APSIPA2017 Singing skill evaluation [47], [43], [22], [59] DAMP SingEval [39] 400 renditions (4 songs, 100 singers per song), each rated by humans on the basis of singing quality https://github.com/chitralekha18/ SingEval.git Singing skill evaluation [39], [58], [57],…”
Section: Datasets For Singing Voice Researchmentioning
confidence: 99%
“…While significant progress has been achieved in automatic speech recognition (ASR) [1][2][3][4][5] and deep learning [6,7], lyrics transcription of polyphonic music remains unsolved. In recent years, there has been an increasing interest in lyrics recognition of polyphonic music, which has potential in many applications such as the automatic generation of karaoke lyrical content, music video subtitling, queryby-singing [8] and singing processing [9][10][11]. The goal of lyrics transcription of polyphonic music is to recognize the lyrics from a song that contains singing vocals mixed with background music.…”
Section: Introductionmentioning
confidence: 99%