VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Zhou, Kun; Şişman, Berrak; Li, Haizhou

doi:10.48550/arxiv.2011.02314

Cited by 1 publication

(1 citation statement)

References 52 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Previous research endeavors have explored various ambient sounds as cited in [17], yet remarkably, the emotional states of individuals within the room have never been integrated into the equation, amplifying the complexity of the challenge at hand, and making it more realistic. Furthermore, the usage of generative models have increased in the study of human emotions and context analysis due to the flexibility and versatility to represent and analize different types of data present all together in the same audio frame [18]- [24]. Generative models, particularly when integrated with classifiers such as variational autoencoders (VAEs), prove to be highly advantageous for classification tasks in the domain of speech and also with ambient sound.…”

Section: Introductionmentioning

confidence: 99%

Estimation of Hazardous Environments Through Speech and Ambient Noise Analysis

Porco,

Dongshik

2023

IJACSA

View full text Add to dashboard Cite

In recent years, significant attention has been directed towards the development of artificial empathy within the engineering academic community. Replicating artificial empathy necessitates the capability of agents to discern human emotions and comprehend environmental risks. Analyzing acoustic data in real environments offers a higher level of non-invasive privacy compared to video and camera data, limiting the agent's understanding to specific patterns. However, current studies are negatively affected by subjective inferences from real data, which can result in inaccurate predictions, leading to both false positives and negatives, especially when contextual data and human speech are involved. This paper work proposes the estimation of a dangerous environment in accordance with the emotional speech and additional ambient noises. In this approach we implement a variational autoencoder model in conjunction with a classifier for training the classification task. Additional regularization techniques are applied to bridge the gap between the original training data and the expected data. The classifier utilizes feature data generated by the variational autoencoder to extract class patterns and determine whether the environment is hazardous. Emotional speech is classified as angry, sad, or scared emotions, contributing to the classification of danger, while happy, calm, and neutral emotions are considered safe. Various ambient noise types, including gunfire and broken glass, are categorized as dangerous, while real-life indoor noises like cooking, eating, and movements are considered safe.

show abstract

Section: Introductionmentioning

confidence: 99%