Andrew Yale scite author profile

We develop metrics for measuring the quality of synthetic health data for both education and research. We use novel and existing metrics to capture a synthetic dataset's resemblance, privacy, utility and footprint. Using these metrics, we develop an end-to-end workflow based on our generative adversarial network (GAN) method, HealthGAN, that creates privacy preserving synthetic health data. Our workflow meets privacy specifications of our data partner: (1) the HealthGAN is trained inside a secure environment; (2) the HealthGAN model is used outside of the secure environment by external users to generate synthetic data. This second step facilitates data handling for external users by avoiding de-identification, which may require special user training, be costly, or cause loss of data fidelity. This workflow is compared against five other baseline methods. While maintaining resemblance and utility comparable to other methods, HealthGAN provides the best privacy and footprint. We present two case studies in which our methodology was put to work in the classroom and research settings. We evaluate utility in the classroom through a data analysis challenge given to students and in research by replicating three different medical papers with synthetic data. Data, code, and the challenge that we organized for educational purposes are available.

show abstract

Medical Time-Series Data Generation Using Generative Adversarial Networks

Dash

Yale

Guyon

et al. 2020

View full text Add to dashboard Cite

We propose a novel bootstrap procedure for dependent data based on Generative Adversarial networks (GANs). We show that the dynamics of common stationary time series processes can be learned by GANs and demonstrate that GANs trained on a single sample path can be used to generate additional samples from the process. We find that temporal convolutional neural networks provide a suitable design for the generator and discriminator, and that convincing samples can be generated on the basis of a vector of iid normal noise. We demonstrate the finite sample properties of GAN sampling and the suggested bootstrap using simulations where we compare the performance to circular block bootstrapping in the case of resampling an AR(1) time series processes. We find that resampling using the GAN can outperform circular block bootstrapping in terms of empirical coverage. * Acknowledgements: The authors gratefully acknowledge support from the Google Tensorflow Research Cloud (TFRC). PyTorch code for this paper is available on request. We also thank Giovanni Mellace and Peter Sandholt Jensen for useful comments.

show abstract

Synthesizing Quality Open Data Assets from Private Health Research Studies

Yale

Dash

Bhanot

et al. 2020

View full text Add to dashboard Cite

Generating synthetic data represents an attractive solution for creating open data, enabling health research and education while preserving patient privacy. We reproduce the research outcomes obtained on two previously published studies, which used private health data, using synthetic data generated with a method that we developed, called HealthGAN. We demonstrate the value of our methodology for generating and evaluating the quality and privacy of synthetic health data. The dataset are from OptumLabs R Data Warehouse (OLDW). The OLDW is accessed within a secure environment and doesn't allow exporting of patient level data of any type of data, real or synthetic, therefore the HealthGAN exports a privacy-preserving generator model instead. The studies examine questions related to comorbidites of Autism Spectrum Disorder (ASD) using medical records of children with ASD and matched patients without ASD. HealthGAN generates high quality synthetic data that produce similar results while preserving patient privacy. By creating synthetic versions of these datasets that maintain privacy and achieve a high level of resemblance and utility, we create valuable open health data assets for future research and education efforts.

show abstract

Assessing privacy and quality of synthetic health data

Yale

Dash

Dutta

et al. 2019

View full text Add to dashboard Cite

Impact of chronic heart failure on adipose tissue functional plasticity: a role for fatty acids?

Yale¹,

Cobb²,

Lyon³

et al. 2014

EJEA

View full text Add to dashboard Cite

Schol…Exodus? Learning Within/Against/Beyond the Institution

Carpenter

Goldblatt

Hanson

et al. 2018

View full text Add to dashboard Cite

Schol…Exodus? Learning Within/Against/Beyond the Institution

Carpenter

Goldblatt

Hanson

et al. 2018

View full text Add to dashboard Cite

Feces on the Philosophy of History!

Carpenter¹,

Goldblatt²,

Hanson³

et al. 2014

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andrew Yale

Generation and evaluation of privacy preserving synthetic health data

Medical Time-Series Data Generation Using Generative Adversarial Networks

Synthesizing Quality Open Data Assets from Private Health Research Studies

Assessing privacy and quality of synthetic health data

Impact of chronic heart failure on adipose tissue functional plasticity: a role for fatty acids?

Schol…Exodus? Learning Within/Against/Beyond the Institution

Schol…Exodus? Learning Within/Against/Beyond the Institution

Feces on the Philosophy of History!

Contact Info

Product

Resources

About