Detecting emotional valence using time-domain analysis of speech signals

Deshpande, Gauri; Viraraghavan, Venkata Subramanian; Duggirala, Mayuri; Patel, Sachin

doi:10.1109/embc.2019.8857691

Cited by 9 publications

(7 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…42,43 At the same time, an end-to-end learning from acoustic features like the Mel-frequency cepstral coefficients (MFCCs) suffers from task independence and requires more resources especially in long audio files. 44,45 We note that the performance drop due to the automated transcription is rather modest, 3.5% in AUC for Task I and 6.0% for Task II, when using the STS+RS+Dem. method.…”

Section: Discussionmentioning

confidence: 95%

“…Another characteristic of our study is that it relies on semantic features, enabling us to transfer the entire pipeline to other languages, given the existence of transcription tools from any language to English and/or powerful NLP models in different languages 42,43 . At the same time, an end‐to‐end learning from acoustic features like the Mel‐frequency cepstral coefficients (MFCCs) suffers from task independence and requires more resources especially in long audio files 44,45 . We note that the performance drop due to the automated transcription is rather modest, 3.5% in AUC for Task I and 6.0% for Task II, when using the STS+RS+Dem.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach

Amini

Hao

Zhang

et al. 2022

Alzheimer's & Dementia

View full text Add to dashboard Cite

Introduction: Automated computational assessment of neuropsychological tests would enable widespread, cost-effective screening for dementia. Methods: A novel natural language processing approach is developed and validated to identify different stages of dementia based on automated transcription of digital voice recordings of subjects' neuropsychological tests conducted by the Framingham Heart Study (n = 1084). Transcribed sentences from the test were encoded into quantitative data and several models were trained and tested using these data and the participants' demographic characteristics. Results: Average area under the curve (AUC) on the held-out test data reached 92.6%, 88.0%, and 74.4% for differentiating Normal cognition from Dementia, Normal or Mild Cognitive Impairment (MCI) from Dementia, and Normal from MCI, respectively.Discussion: The proposed approach offers a fully automated identification of MCI and dementia based on a recorded neuropsychological test, providing an opportunity to develop a remote screening tool that could be adapted easily to any language.

show abstract

Section: Discussionmentioning

confidence: 95%

Section: Discussionmentioning

confidence: 99%

Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach

Amini

Hao

Zhang

et al. 2022

Alzheimer's & Dementia

View full text Add to dashboard Cite

show abstract

“…These were calculated for each speaker turn, that is, when the surgeon was speaking as part of a communication sender or receiver. Successive differences were also calculated for each vocal feature per turn to capture changes of the features over time (Deshpande et al, 2019). Descriptive statistics obtained for these eight measures included minimum, maximum, mean, standard deviation, range, and interquartile range.…”

Section: Methodsmentioning

confidence: 99%

Objective Nontechnical Skills Measurement Using Sensor-based Behavior Metrics in Surgical Teams

et al. 2022

View full text Add to dashboard Cite

Objective The purpose of this study was to identify objective measures that predict surgeon nontechnical skills (NTS) during surgery. Background NTS are cognitive and social skills that impact operative performance and patient outcomes. Current methods for NTS assessment in surgery rely on observation-based tools to rate intraoperative behavior. These tools are resource intensive (e.g., time for observation or manual labeling) to perform; therefore, more efficient approaches are needed. Method Thirty-four robotic-assisted surgeries were observed. Proximity sensors were placed on the surgical team and voice recorders were placed on the surgeon. Surgeon NTS was assessed by trained observers using the NonTechnical Skills for Surgeons (NOTSS) tool. NTS behavior metrics from the sensors included communication, speech, and proximity features. The metrics were used to develop mixed effect models to predict NOTSS score and in machine learning classifiers to distinguish between exemplar NTS scores (highest NOTSS score) and non-exemplar scores. Results NTS metrics were collected from 16 nurses, 12 assistants, 11 anesthesiologists, and four surgeons. Nineteen behavior features and overall NOTSS score were significantly correlated (12 communication features, two speech features, five proximity features). The random forest classifier achieved the highest accuracy of 70% (80% F1 score) to predict exemplar NTS score. Conclusion Sensor-based measures of communication, speech, and proximity can potentially predict NOTSS scores of surgeons during robotic-assisted surgery. These sensing-based approaches can be utilized for further reducing resource costs of NTS and team performance assessment in surgical environments. Application Sensor-based assessment of operative teams’ behaviors can lead to objective, real-time NTS measurement.

show abstract

“…Cook et al [19], [20] explored the structure of the fundamental frequency (F0), extracting dominant pitches in the detection of valence from speech. Despande et al [21] proposed a reduced feature set consisting of the autocorrelation of pitch contour, root mean square (RMS) energy and a 10-dimensional time domain difference (TDD) vector. The TDD vector corresponds to successive differences in the speech signal.…”

Section: Improving the Prediction Of Valencementioning

confidence: 99%

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Sridhar

Busso

2022

IEEE Trans. Affective Comput.

View full text Add to dashboard Cite

The prediction of valence from speech is an important, but challenging problem. The expression of valence in speech has speaker-dependent cues, which contribute to performances that are often significantly lower than the prediction of other emotional attributes such as arousal and dominance. A practical approach to improve valence prediction from speech is to adapt the models to the target speakers in the test set. Adapting a speech emotion recognition (SER) system to a particular speaker is a hard problem, especially with deep neural networks (DNNs), since it requires optimizing millions of parameters. This study proposes an unsupervised approach to address this problem by searching for speakers in the train set with similar acoustic patterns as the speaker in the test set. Speech samples from the selected speakers are used to create the adaptation set. This approach leverages transfer learning using pre-trained models, which are adapted with these speech samples. We propose three alternative adaptation strategies: unique speaker, oversampling and weighting approaches. These methods differ on the use of the adaptation set in the personalization of the valence models. The results demonstrate that a valence prediction model can be efficiently personalized with these unsupervised approaches, leading to relative improvements as high as 13.52%.

show abstract

Detecting emotional valence using time-domain analysis of speech signals

Cited by 9 publications

References 23 publications

Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach

Automated detection of mild cognitive impairment and dementia from voice recordings: A natural language processing approach

Objective Nontechnical Skills Measurement Using Sensor-based Behavior Metrics in Surgical Teams

Unsupervised Personalization of an Emotion Recognition System: The Unique Properties of the Externalization of Valence in Speech

Contact Info

Product

Resources

About