What, if any, similarities and differences between song and speech are consistent across cultures? Both song and speech are found in all known human societies and are argued to share evolutionary roots and cognitive resources, yet no studies have compared similarities and differences between song and speech across languages on a global scale. We will compare sets of matched song/speech recordings produced by our 81 coauthors whose 1st/heritage languages span 23 language families. Each recording set consists of singing, recited lyrics, and spoken description, plus an optional instrumental version of the sung melody to allow us to capture a “musi-linguistic continuum” from instrumental music to naturalistic speech. Our literature review and pilot analysis using five audio recording sets (by speakers of Japanese, English, Farsi, Yoruba, and Marathi) led us to make six predictions for confirmatory analysis comparing song vs. spoken descriptions: three consistent differences and three consistent similarities. For differences, we predict that: 1) songs will have higher pitch than speech, 2) songs will be slower than speech, and 3) songs will have more stable pitch than speech. For similarities, we predict that 4) pitch interval size, 5) timbral brightness, and 6) pitch declination will be similar for song and speech. Because our opportunistic language sample (approximately half are Indo-European languages) and unusual design involving coauthors as participants (approximately 1/5 of coauthors had some awareness of our hypotheses when we recorded our singing/speaking) could affect our results, we will include robustness analyses to ensure our conclusions are robust to these biases, should they exist. Other features (e.g., rhythmic isochronicity, loudness) and comparisons involving instrumental melodies and recited lyrics will be investigated through post-hoc exploratory analyses. Our sample size of n=80 people providing sung/spoken recordings already exceeds the required number of recordings (i.e. 60) to achieve 95% power with the alpha level of 0.05 for the hypothesis testing of the selected six features. Our study will provide diverse cross-linguistic empirical evidence regarding the existence of cross-cultural regularities in song and speech, shed light on factors shaping humanity’s two universal vocal communication forms, and provide rich cross-cultural data to generate new hypotheses and inform future analyses of other factors (e.g., functional context, sex, age, musical/linguistic experience) that may shape global musical and linguistic diversity.
We present a platform and a dataset to help research on Music Emotion Recognition (MER). We developed the Music Enthusiasts platform aiming to improve the gathering and analysis of the so-called “ground truth” needed as input to MER systems. Firstly, our platform involves engaging participants using citizen science strategies and generate music emotion annotations – the platform presents didactic information and musical recommendations as incentivization, and collects data regarding demographics, mood, and language from each participant. Participants annotated each music excerpt with single free-text emotion words (in native language), distinct forced-choice emotion categories, preference, and familiarity. Additionally, participants stated the reasons for each annotation – including those distinctive of emotion perception and emotion induction. Secondly, our dataset was created for personalized MER and contains information from 181 participants, 4721 annotations, and 1161 music excerpts. To showcase the use of the dataset, we present a methodology for personalization of MER models based on active learning. The experiments show evidence that using the judgment of the crowd as prior knowledge for active learning allows for more effective personalization of MER systems for this particular dataset. Our dataset is publicly available and we invite researchers to use it for testing MER systems.
We present a platform and a dataset to help research on Music Emotion Recognition (MER). We developed the Music Enthusiasts platform aiming to improve the gathering and analysis of the so-called “ground truth” needed as input to MER systems. Firstly, our platform involves engaging participants using citizen science strategies and generate music emotion annotations -- the platform presents didactic information and musical recommendations as incentivization, and collects data regarding demographics, mood, and language from each participant. Participants annotated each music excerpt with single free-text emotion words (in native language), distinct forced-choice emotion categories, preference, and familiarity. Additionally, participants stated the reasons for each annotation -- including those distinctive of emotion perception and emotion induction. Secondly, our dataset was created for personalized MER and contains information from 181 participants, 4721 annotations, and 1161 music excerpts. To showcase the use of the dataset, we present a methodology for personalization of MER models based on active learning. The experiments show evidence that using the judgment of the crowd as prior knowledge for active learning allows for more effective personalization of MER systems for this particular dataset. Our dataset is publicly available and we invite researchers to use it for testing MER systems.
The understanding of the emotions in music has motivated research across diverse areas of knowledge for decades. In the field of computer science, there is a particular interest in developing algorithms to "predict" the emotions in music perceived by or induced to a listener. However, the gathering of reliable "ground truth" data for modeling the emotional content of music poses challenges, since tasks related with annotations of emotions are time consuming, expensive and cognitively demanding due to its inherent subjectivity and its cross-disciplinary nature. Citizen science projects have proven to be a useful approach to solve these types of problems where there is a need for recruiting collaborators for massive scale tasks. We developed a platform for annotating emotional content in musical pieces following a citizen science approach, to benefit not only the researchers, who benefit from the generated dataset, but also the volunteers, who are engaged to collaborate on the research project, not only by providing annotations but also through their self and community-awareness about the emotional perception of the music. Likewise, gamification mechanisms motivate the participants to explore and discover new music based on the emotional content. Preliminary user evaluations showed that the platform design is in line with the motivations of the general public, and that the citizen science approach offers an iterative refinement to enhance the quantity and quality of contributions by involving volunteers in the design process. The usability of the platform was acceptable, although some of the features require improvements.
Our previous research showed promising results when transferring features learned from speech to train emotion recognition models for music. In this context, we implemented a denoising autoencoder as a pretraining approach to extract features from speech in two languages (English and Mandarin). From that, we performed transfer and multi-task learning to predict classes from the arousal-valence space of music emotion. We tested and analyzed intra-linguistic and cross-linguistic settings, depending on the language of speech and lyrics of the music. This paper presents additional investigation on our approach, which reveals that: (1) performing pretraining with speech in a mixture of languages yields similar results than for specific languages -the pretraining phase appears not to exploit particular language features, (2) the music in Mandarin dataset consistently results in poor classification performance -we found low agreement in annotations, and (3) novel methodologies for representation learning (Contrastive Predictive Coding) may exploit features from both languages (i.e., pretraining on a mixture of languages) and improve classification of music emotions in both languages. From this study we conclude that more research is still needed to understand what is actually being transferred in these type of contexts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.