Since emotions are expressed through a combination of verbal and nonverbal channels, a joint analysis of speech and gestures is required to understand expressive human communication. To facilitate such investigations, this paper describes a new corpus named the ''interactive emotional dyadic motion capture database'' (IEMOCAP), collected by the Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California (USC). This database was recorded from ten actors in dyadic sessions with markers on the face, head, and hands, which provide detailed information about their facial expressions and hand movements during scripted and spontaneous spoken communication scenarios. The actors performed selected emotional scripts and also improvised hypothetical scenarios designed to elicit specific types of emotions (happiness, anger, sadness, frustration and neutral state). The corpus contains approximately 12 h of data. The detailed motion capture information, the interactive setting to elicit authentic emotions, and the size of the database make this corpus a valuable addition to the existing databases in the community for the study and modeling of multimodal and expressive human communication.
In order to allocate the healthcare resource, triage classification system plays an important role in assessing the severity of illness of the boarding patient at emergency department. The self-report pain intensity numerical-rating scale (NRS) is one of the major modifiers of the current triage system based on the Taiwan Triage and Acuity Scale (TTAS). The validity and reliability of self-report scheme for pain level assessment is a major concern. In this study, we model the observed expressive behaviors, i.e., facial expressions and vocal characteristics, directly from audio-video recordings in order to measure pain level for patients during triage. This work demonstrates a feasible model, which achieves an accuracy of 72.3% and 51.6% in a binary and ternary pain intensity classification. Moreover, the study result reveals a significant association of current model and analgesic prescription/patient disposition after adjusted for patient-report NRS and triage vital signs.
Pain is an unpleasant internal sensation caused by bodily damages or physical illnesses with varied expressions conditioned on personal attributes. In this work, we propose an age-gender embedded latent acoustic representation learned using conditional maximum mean discrepancy variational autoencoder (MMD-CVAE). The learned MMD-CVAE embeds personal attributes information directly in the latent space. Our method achieves a 70.7% in extreme set classification (severe versus mild) and 47.7% in three-class recognition (severe, moderate, and mild) by using these MMD-CVAE encoded features on a large-scale real patients pain database. Our method improves a relative of 11.34% and 17.51% compared to using acoustic representation without age-gender conditioning in the extreme set and the three-class recognition respectively. Further analyses reveal under severe pain, females have higher maximum of jitter and lower harmonic energy ratio between F0, H1 and H2 compared to males, and the minimum value of jitter and shimmer are higher in the elderly compared to the non-elder group.
In this article of the “Interdisciplinary Insights Into Group and Team Dynamics” special issue, we provide guidance for computer scientists and social scientists who seek an interdisciplinary approach to group research. We include how-to guidelines for researchers interested in initiating and maintaining collaborations, and discuss opportunities and pitfalls of interdisciplinary group research. Last, we include a brief case study that portrays some of the complications of creating shared understanding.
BackgroundIntelligent decision support systems (IDSS) have been applied to tasks of disease management. Deep neural networks (DNNs) are artificial intelligent techniques to achieve high modeling power. The application of DNNs to large-scale data for estimating stroke risk needs to be assessed and validated. This study aims to apply a DNN for deriving a stroke predictive model using a big electronic health record database.Methods and resultsThe Taiwan National Health Insurance Research Database was used to conduct a retrospective population-based study. The database was divided into one development dataset for model training (~70% of total patients for training and ~10% for parameter tuning) and two testing datasets (each ~10%). A total of 11,192,916 claim records from 840,487 patients were used. The primary outcome was defined as any ischemic stroke in inpatient records within 3 years after study enrollment. The DNN was evaluated using the area under the receiver operating characteristic curve (AUC or c-statistic). The development dataset included 672,214 patients (a total of 8,952,000 records) of whom 2,060 patients had stroke events. The mean age of the population was 35.5±20.2 years, with 48.5% men. The model achieved AUC values of 0.920 (95% confidence interval [CI], 0.908–0.932) in testing dataset 1 and 0.925 (95% CI, 0.914–0.937) in testing dataset 2. Under a high sensitivity operating point, the sensitivity and specificity were 92.5% and 79.8% for testing dataset 1; 91.8% and 79.9% for testing dataset 2. Under a high specificity operating point, the sensitivity and specificity were 80.3% and 87.5% for testing dataset 1; 83.7% and 87.5% for testing dataset 2. The DNN model maintained high predictability 5 years after being developed. The model achieved similar performance to other clinical risk assessment scores.ConclusionsUsing a DNN algorithm on this large electronic health record database is capable of obtaining a high performing model for assessment of ischemic stroke risk. Further research is needed to determine whether such a DNN-based IDSS could lead to an improvement in clinical practice.
Vocal expression is essential for conveying the emotion during social interaction. Although vocal emotion has been explored in previous studies, little is known about how perception of different vocal emotional expressions modulates the functional brain network topology. In this study, we aimed to investigate the functional brain networks under different attributes of vocal emotion by graph-theoretical network analysis. Functional magnetic resonance imaging (fMRI) experiments were performed on 36 healthy participants. We utilized the Power-264 functional brain atlas to calculate the interregional functional connectivity (FC) from fMRI data under resting state and vocal stimuli at different arousal and valence levels. The orthogonal minimal spanning trees method was used for topological filtering. The paired-sample t -test with Bonferroni correction across all regions and arousal–valence levels were used for statistical comparisons. Our results show that brain network exhibits significantly altered network attributes at FC, nodal and global levels, especially under high-arousal or negative-valence vocal emotional stimuli. The alterations within/between well-known large-scale functional networks were also investigated. Through the present study, we have gained more insights into how comprehending emotional speech modulates brain networks. These findings may shed light on how the human brain processes emotional speech and how it distinguishes different emotional conditions.
Pain is an internal construct with vocal manifestation that varies as a function of personal and clinical attributes. Understanding the vocal indicators of pain-levels is important in providing an objective analytic in clinical assessment and intervention. In this work, we focus on investigating the variability of voice quality as a function of multiple clinical parameters at different pain-levels, specifically for emergency room patients during triage. Their pain-induced pathological voice quality characteristics are naturally affected by an individual attributes such as age, gender and pain-sites. We conduct a detailed multivariate statistical analysis on a 181 unique patient's vocal quality using recordings of real triage sessions. Our analysis show several important insights, 1) voice quality only varies statistically with pain-levels when interacting effect from other clinical parameters is considered, 2) senior group shows a higher value of voicing probability and shimmer when experiencing severe pain, 3) patients with abdomen pain have a lower jitter and shimmer during severe pain that is different from patients experiencing musculoskeletal pathology, and 4) there could be a relationship between the variation in the voice quality and the neural pathway of pain as evident by interacting with the pain-site factor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.