Conditional Generative Data Augmentation for Clinical Audio Datasets

Seibold, Matthias; Hoch, Armando; Farshad, Mazda; Navab, Nassir; Fürnstahl, Philipp

doi:10.1007/978-3-031-16449-1_33

“…The value of r is a hyperparameter and for our method it was chosen equal to 16 The generator has an overall of 1, 537, 316 parameters. For the discriminator we use a fully convolutional network architecture with a total of 4, 321, 153 parameters analogous to our own previous work [12]. Both the generator and discriminator employ the LeakyReLU non-linear activation function throughout the whole network structure.…”

Section: Proposed Data Augmentation Methodsmentioning

confidence: 99%

“…Channel attention has been successfully exploited to model channel level dependencies and facilitate learning of less redundant features [16] [17] [18] and subsequently improved model performance. Motivated by these observations, in this paper, we demonstrate that due to the huge number of model parameters, conditional generative adversarial network (cWGAN-GP [12]) learns redundant features. To combat this, we introduce a channel-wise attention mechanism in the generator sub-network through the implementation of Squeeze & Excitation [16] block and residual skip connections [19].…”

Section: Introductionmentioning

confidence: 97%

Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data

Margaryan¹,

Seibold²,

Joshi³

et al. 2022

Preprint

0

View full text Add to dashboard Cite

Data augmentation is a valuable tool for the design of deep learning systems to overcome data limitations and stabilize the training process. Especially in the medical domain, where the collection of large-scale data sets is challenging and expensive due to limited access to patient data, relevant environments, as well as strict regulations, community-curated large-scale public datasets, pretrained models, and advanced data augmentation methods are the main factors for developing reliable systems to improve patient care. However, for the development of medical acoustic sensing systems, an emerging field of research, the community lacks large-scale publicly available data sets and pretrained models. To address the problem of limited data, we propose a conditional generative adversarial neural network-based augmentation method which is able to synthesize mel spectrograms from a learned data distribution of a source data set. In contrast to previously proposed fully convolutional models, the proposed model implements residual Squeeze and Excitation modules in the generator architecture. We show that our method outperforms all classical audio augmentation techniques and previously published generative methods in terms of generated sample quality and a performance improvement of 2.84% of Macro F1-Score for a classifier trained on the augmented data set, an enhancement of 1.14% in relation to previous work. By analyzing the correlation of intermediate feature spaces, we show that the residual Squeeze and Excitation modules help the model to reduce redundancy in the latent features. Therefore, the proposed model advances the state-of-the-art in the augmentation of clinical audio data and improves the data bottleneck for the design of clinical acoustic sensing systems.

show abstract

“…We use a publicly available data set 4 [12] recorded during real Total Hip Arthroplasty surgeries and contains sounds of the typical surgical actions that are performed during the intervention and roughly resemble the different phases of the procedure. The data set includes 568 recordings with a length of 1 s to 31 s and the following distribution: n raw,Adjustment = 68, n raw,Coagulation = 117, n raw,Insertion = 76, n raw,Reaming = 64, n raw,Sawing = 21, and n raw,Suction = 222.…”

Section: Data Set Preprocessing and Benchmark Augmentationsmentioning

confidence: 99%

“…Channel attention has been successfully exploited to model channel level dependencies and facilitate learning of less redundant features [16] [17] [18] and subsequently improved model performance. Motivated by these observations, in this paper, we demonstrate that due to the huge number of model parameters, conditional generative adversarial network (cWGAN-GP [12]) learns redundant features. To combat this, we introduce a channel-wise attention mechanism in the generator sub-network through the implementation of Squeeze & Excitation [16] block and residual skip connections [19].…”

Section: Introductionmentioning

confidence: 97%

“…Therefore, especially in the medical domain, data augmentation is a valuable tool to artificially increase the size of a training data set to increase the diversity of training examples and stabilize the training process. To address this issue, we published a medical audio dataset in a previous work which contains acoustic signals recorded in the real operating room during THA procedures which resemble typical surgical actions such as hammering, drilling, or sawing [12] and proposed a data augmentation method based on a conditional generative adversarial network.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data

Margaryan,

Seibold,

Joshi

et al. 2023

Lecture Notes in Electrical Engineering

Self Cite

1

0

View full text Add to dashboard Cite

Data augmentation is a valuable tool for the design of deep learning systems to overcome data limitations and stabilize the training process. Especially in the medical domain, where the collection of large-scale data sets is challenging and expensive due to limited access to patient data, relevant environments, as well as strict regulations, community-curated large-scale public datasets, pretrained models, and advanced data augmentation methods are the main factors for developing reliable systems to improve patient care. However, for the development of medical acoustic sensing systems, an emerging field of research, the community lacks large-scale publicly available data sets and pretrained models. To address the problem of limited data, we propose a conditional generative adversarial neural network-based augmentation method which is able to synthesize mel spectrograms from a learned data distribution of a source data set. In contrast to previously proposed fully convolutional models, the proposed model implements residual Squeeze and Excitation modules in the generator architecture. We show that our method outperforms all classical audio augmentation techniques and previously published generative methods in terms of generated sample quality and a performance improvement of 2.84% of Macro F1-Score for a classifier trained on the augmented data set, an enhancement of 1.14% in relation to previous work. By analyzing the correlation of intermediate feature spaces, we show that the residual Squeeze and Excitation modules help the model to reduce redundancy in the latent features. Therefore, the proposed model advances the state-of-the-art in the augmentation of clinical audio data and improves the data bottleneck for the design of clinical acoustic sensing systems.

show abstract

Generative AI Mitigates Representation Bias and Improves Model Fairness Through Synthetic Health Data

Micheletti,

Marchesi,

Kuo

et al. 2023

Preprint

1

0

View full text Add to dashboard Cite

Representation bias in health data can lead to unfair decisions, compromising the generalisability of research findings and impeding under-represented subpopulations from benefiting from clinical discoveries. Several approaches have been developed to mitigate representation bias, ranging from simple resampling methods, such as SMOTE, to recent approaches based on generative adversarial networks (GAN). However, generating high-dimensional time-series synthetic health data remains challenging for both resampling and GAN-based approaches. In this work, we propose a novel CA-GAN architecture able to synthesise authentic, high-dimensional time series data. CA-GAN outperforms state-of-the-art methods in qualitative and quantitative evaluation while avoiding mode collapse, a significant GAN failure. We evaluate CA-GAN’s generalisability in mitigating representation bias for Black patients in two diverse, clinically relevant datasets: acute hypotension and sepsis. Finally, we show that CA-GAN generates authentic data of the minority class while faithfully maintaining the original distribution of both datasets.

show abstract

Conditional Generative Data Augmentation for Clinical Audio Datasets

Cited by 6 publications

References 25 publications

Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data

Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data

Improved Techniques for the Conditional Generative Augmentation of Clinical Audio Data

Generative AI Mitigates Representation Bias and Improves Model Fairness Through Synthetic Health Data

Contact Info

Product

Resources

About