NHSS: A speech and singing parallel database

Sharma, Bidisha; Gao, Xiaojian; Vijayan, Karthika; Tian, Xinmei; Li, Haizhou

doi:10.1016/j.specom.2021.07.002

Cited by 16 publications

(10 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We selected these features by reviewing what past studies focused on for the analysis of song-speech comparison and prominently observed features in music (e.g. Fitch, 2006;Hansen et al, 2020;Hilton et al, 2022;Savage et al, 2015;Sharma et al, 2021, see the Supplementary Discussion section S1.1 for a more comprehensive literature review). Here, f 0 , rate of change of f 0 , and spectral centroid are extracted purely from acoustic signals, while IOI rate is based purely on manual annotations.…”

Section: Featuresmentioning

confidence: 99%

“…(emphasis added) Importantly, however, Savage et al's conclusion was based only on an analysis of music, thus the contrast with speech is speculative and not based on comparative data. Some studies have identified differences between speech and song in specific languages, such as song being slower and higher-pitched (Hansen et al, 2020;Merrill & Larrouy-Maestri, 2017;Sharma et al, 2021;Vanden Bosch der Nederlanden et al, 2022). However, a lack of annotated cross-cultural recordings of matched speaking and singing has hampered attempts to establish cross-cultural relationships between speech and song (cf.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report

Ozaki¹,

Tierney²,

Pfordresher³

et al. 2022

Preprint

View full text Add to dashboard Cite

What, if any, similarities and differences between song and speech are consistent across cultures? Both song and speech are found in all known human societies and are argued to share evolutionary roots and cognitive resources, yet no studies have compared similarities and differences between song and speech across languages on a global scale. We will compare sets of matched song/speech recordings produced by our 81 coauthors whose 1st/heritage languages span 23 language families. Each recording set consists of singing, recited lyrics, and spoken description, plus an optional instrumental version of the sung melody to allow us to capture a “musi-linguistic continuum” from instrumental music to naturalistic speech. Our literature review and pilot analysis using five audio recording sets (by speakers of Japanese, English, Farsi, Yoruba, and Marathi) led us to make six predictions for confirmatory analysis comparing song vs. spoken descriptions: three consistent differences and three consistent similarities. For differences, we predict that: 1) songs will have higher pitch than speech, 2) songs will be slower than speech, and 3) songs will have more stable pitch than speech. For similarities, we predict that 4) pitch interval size, 5) timbral brightness, and 6) pitch declination will be similar for song and speech. Because our opportunistic language sample (approximately half are Indo-European languages) and unusual design involving coauthors as participants (approximately 1/5 of coauthors had some awareness of our hypotheses when we recorded our singing/speaking) could affect our results, we will include robustness analyses to ensure our conclusions are robust to these biases, should they exist. Other features (e.g., rhythmic isochronicity, loudness) and comparisons involving instrumental melodies and recited lyrics will be investigated through post-hoc exploratory analyses. Our sample size of n=80 people providing sung/spoken recordings already exceeds the required number of recordings (i.e. 60) to achieve 95% power with the alpha level of 0.05 for the hypothesis testing of the selected six features. Our study will provide diverse cross-linguistic empirical evidence regarding the existence of cross-cultural regularities in song and speech, shed light on factors shaping humanity’s two universal vocal communication forms, and provide rich cross-cultural data to generate new hypotheses and inform future analyses of other factors (e.g., functional context, sex, age, musical/linguistic experience) that may shape global musical and linguistic diversity.

show abstract

Section: Featuresmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report

Ozaki¹,

Tierney²,

Pfordresher³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Due to music copyright restrictions, one of the main hindrances for research in singing voice has been the lack of appropriately annotated publicly available datasets. In recent years, companies such as Smule Inc. and KaraFun have contributed datasets to the research community, and researchers have come together to prepare annotated datasets for the community such as NHSS [126], DALI [228], and NUS48E [97]. These datasets are used or can potentially be used for multiple tasks.…”

Section: Datasets For Singing Voice Researchmentioning

confidence: 99%

“…Lyrics transcription in solo singing [209], [211], singer identification and query by singing [232], singing style and intonation pattern analysis [60], [233] DSing (DAMP Sing! Lyrics Curated) [209] 150 hours curated English songs data from the DAMP dataset; removed noisy data https://github.com/groadabike/ Kaldi-Dsing-task Lyrics transcription in solo singing [209], [211], [215] DAMP-VSEP [234] 11,494 compositions (155 countries, 36 languages, 6456 artists) with backing tracks, one or more isolated vocals, and a mixture of the two https://zenodo.org/record/3553059 Singing voice separation [173] DAMP Aligned [203] 50 hours training data, 2.3 hours test; lyrics aligned and short segments https://github.com/chitralekha18/ lyrics-aligned-solo-singing-dataset Lyrics transcription in solo singing [203], [226], [224] DALI [228] 134 hours English polyphonic song utterances with aligned lyrics https://github.com/gabolsgabs/DALI Lyrics transcription in polyphonic music NUS48E [97] 2.8 hours recordings of the sung and spoken lyrics of 48 (20 unique) English songs by 12 subjects and transcriptions and duration annotations at the phonelevel https://smcnus.comp.nus.edu.sg/ nus-48e-sung-and-spoken-lyrics-corpus/ Speech-singing conversion [235], singing synthesis, pronunciation evaluation [236], phoneme alignment in solo singing NHSS [126] 100 songs sung and spoken by 10 singers, resulting in total of 7 hours audio data https://hltnus.github.io/NHSSDatabase/ index.html Speech-singing conversion, singing synthesis, lyrics alignment in solo singing NUS48E+ SingEval [43] 2 songs, 20 singers; music experts labels on pitch, rhythm, etc. https://github.com/chitralekha18/ PESnQ APSIPA2017 Singing skill evaluation [47], [43], [22], [59] DAMP SingEval [39] 400 renditions (4 songs, 100 singers per song), each rated by humans on the basis of singing quality https://github.com/chitralekha18/ SingEval.git Singing skill evaluation [39], [58], [57],…”

Section: Datasets For Singing Voice Researchmentioning

confidence: 99%

Deep Learning Approaches in Topics of Singing Information Processing

Gupta

Goto

2022

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Singing, the vocal production of musical tones, is one of the most important elements of music. Addressing the needs of real-world applications, the study of technologies related to singing voices has become an increasingly active area of research. In this paper, we provide a comprehensive overview of the recent developments in the field of singing information processing, specifically in the topics of singing skill evaluation, singing voice synthesis, singing voice separation, and lyrics synchronization and transcription. We will especially focus on deep learning approaches including modern representation learning techniques for singing voices. We will also provide an overview of contributions in public datasets for singing voice research.

show abstract

“…While significant progress has been achieved in automatic speech recognition (ASR) [1][2][3][4][5] and deep learning [6,7], lyrics transcription of polyphonic music remains unsolved. In recent years, there has been an increasing interest in lyrics recognition of polyphonic music, which has potential in many applications such as the automatic generation of karaoke lyrical content, music video subtitling, queryby-singing [8] and singing processing [9][10][11]. The goal of lyrics transcription of polyphonic music is to recognize the lyrics from a song that contains singing vocals mixed with background music.…”

Section: Introductionmentioning

confidence: 99%

Music-robust Automatic Lyrics Transcription of Polyphonic Music

Gao¹,

Gupta²,

Li³

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Lyrics transcription of polyphonic music is challenging because singing vocals are corrupted by the background music. To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i.e. musicremoved features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i.e. music-present features. We show that these two sets of features complement each other, and their combination performs better than when they are used alone, thus improving the robustness of the acoustic model to the background music. Furthermore, language model interpolation between a generalpurpose language model and an in-domain lyrics-specific language model provides further improvement in transcription results. Our experiments show that our proposed strategy outperforms the existing lyrics transcription systems for polyphonic music. Moreover, we find that our proposed music-robust features specially improve the lyrics transcription performance in metal genre of songs, where the background music is loud and dominant.

show abstract

NHSS: A speech and singing parallel database

Cited by 16 publications

References 41 publications

Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report

Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report

Deep Learning Approaches in Topics of Singing Information Processing

Music-robust Automatic Lyrics Transcription of Polyphonic Music

Contact Info

Product

Resources

About