We examine the similarities and differences in the expression of emotion in the singing and the speaking voice. Three internationally renowned opera singers produced "vocalises" (using a schwa vowel) and short nonsense phrases in different interpretations for 10 emotions. Acoustic analyses of emotional expression in the singing samples show significant differences between the emotions. In addition to the obvious effects of loudness and tempo, spectral balance and perturbation make significant contributions (high effect sizes) to this differentiation. A comparison of the emotion-specific patterns produced by the singers in this study with published data for professional actors portraying different emotions in speech generally show a very high degree of similarity. However, singers tend to rely more than actors on the use of voice perturbation, specifically vibrato, in particular in the case of high arousal emotions. It is suggested that this may be due to by the restrictions and constraints imposed by the musical structure.
We investigate the automatic recognition of emotions in the singing voice and study the worth and role of a variety of relevant acoustic parameters. The data set contains phrases and vocalises sung by eight renowned professional opera singers in ten different emotions and a neutral state. The states are mapped to ternary arousal and valence labels. We propose a small set of relevant acoustic features basing on our previous findings on the same data and compare it with a large-scale state-of-the-art feature set for paralinguistics recognition, the baseline feature set of the Interspeech 2013 Computational Paralinguistics ChallengE (ComParE). A feature importance analysis with respect to classification accuracy and correlation of features with the targets is provided in the paper. Results show that the classification performance with both feature sets is similar for arousal, while the ComParE set is superior for valence. Intra singer feature ranking criteria further improve the classification accuracy in a leave-one-singer-out cross validation significantly.
The perception of modal and falsetto registers was analyzed in a material consisting of a total of 104 vowel sounds sung by 13 choir singers, 52 sung in modal register, and 52 in falsetto register. These vowel sounds were classified by 16 expert listeners in a forced choice test and the number of votes for modal was compared to the voice source parameters: (1) closed quotient (Q(closed)), (2) level difference between the two lowest source spectrum partials (H1-H2), (3) AC amplitude, (4) maximum flow declination rate (MFDR), and (5) normalized amplitude quotient (NAQ, AC amplitude/MFDR(*) fundamental frequency). Tones with a high value of Q(closed) and low values of H1-H2 and of NAQ were typically associated with high number of votes for modal register, and vice versa, Q(closed) showing the strongest correlation. Some singer subjects produced tones that could not be classified as either falsetto or modal register, suggesting that classification of registers is not always feasible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.