Anonymiced Shareable Data: Using mice to Create and Analyze Multiply Imputed Synthetic Datasets

Volker, Thom Benjamin; Vink, Gerko

doi:10.3390/psych3040045

Cited by 6 publications

(7 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In cases where this model can be specified directly, conventional software for MI can be used to draw the synthetic values x ðmÞ syn (e.g., the packages "norm" or "jomo" in R; Quartagno et al, 2019;Schafer & Olsen, 1998; see also Volker & Vink, 2021). Otherwise, the synthetic data can be generated from a sequential model:…”

Section: Illustrative Examplementioning

confidence: 99%

“…We assume that masked copies x1∗ and x2∗ of the original x 1 and x 2 have been created and added to the data set. In DA-MI P , the masked copies x1∗ and x2∗ are treated as additional predictors, and the synthetic data are simulated from: In cases where this model can be specified directly, conventional software for MI can be used to draw the synthetic values boldxsyn(m) (e.g., the packages “norm” or “jomo” in R; Quartagno et al, 2019; Schafer & Olsen, 1998; see also Volker & Vink, 2021). Otherwise, the synthetic data can be generated from a sequential model: which can be implemented in the “synthpop” package in R (Nowok et al, 2016) or similar software by adding the masked copies as additional predictor variables in the synthesis model.…”

Section: Data-augmented MI Of Synthetic Data (Da-mi)mentioning

confidence: 99%

See 1 more Smart Citation

Using synthetic data to improve the reproducibility of statistical results in psychological research.

Grund¹,

Lüdtke²,

Robitzsch³

2022

Psychological Methods

View full text Add to dashboard Cite

In recent years, psychological research has faced a credibility crisis, and open data are often regarded as an important step toward a more reproducible psychological science. However, privacy concerns are among the main reasons that prevent data sharing. Synthetic data procedures, which are based on the multiple imputation (MI) approach to missing data, can be used to replace sensitive data with simulated values, which can be analyzed in place of the original data. One crucial requirement of this approach is that the synthesis model is correctly specified. In this article, we investigated the statistical properties of synthetic data with a particular emphasis on the reproducibility of statistical results. To this end, we compared conventional approaches to synthetic data based on MI with a data-augmented approach (DA-MI) that attempts to combine the advantages of masking methods and synthetic data, thus making the procedure more robust to misspecification. In multiple simulation studies, we found that the good properties of the MI approach strongly depend on the correct specification of the synthesis model, whereas the DA-MI approach can provide useful results even under various types of misspecification. This suggests that the DA-MI approach to synthetic data can provide an important tool that can be used to facilitate data sharing and improve reproducibility in psychological research. In a working example, we also demonstrate the implementation of these approaches in widely available software, and we provide recommendations for practice.

show abstract

Section: Illustrative Examplementioning

confidence: 99%

Section: Data-augmented MI Of Synthetic Data (Da-mi)mentioning

confidence: 99%

Using synthetic data to improve the reproducibility of statistical results in psychological research.

Grund¹,

Lüdtke²,

Robitzsch³

2022

Psychological Methods

View full text Add to dashboard Cite

show abstract

“…Volker and Vink [24] outline a workflow for generating synthetic data with the multiple imputation software mice. It was demonstrated in a simulation study that the analysis results obtained on synthetic data yielded unbiased and valid statistical inference.…”

Section: Missing Data and Synthetic Datamentioning

confidence: 99%

“…It was demonstrated in a simulation study that the analysis results obtained on synthetic data yielded unbiased and valid statistical inference. Volker and Vink [24] argue that the ease of use when synthesizing data with mice, along with the validity of inferences obtained, demonstrates rich possibilities for data dissemination.…”

Section: Missing Data and Synthetic Datamentioning

confidence: 99%

Editorial of the Psych Special Issue “Computational Aspects, Statistical Algorithms and Software in Psychometrics”

Robitzsch

2022

Psych

View full text Add to dashboard Cite

show abstract

“…The implement of PPC in MICE (version 3.13.15) is straightforward. A new argument where is included in mice function which allows us to replace the observed data by randomly drawing values from the predictive posterior distribution (Volker and Vink, 2021). Here is an example of generating replications of the observed data.…”

Section: Mice Packagementioning

confidence: 99%

Informed strategies for multivariate missing data

Cai¹

View full text Add to dashboard Cite

show abstract

Anonymiced Shareable Data: Using mice to Create and Analyze Multiply Imputed Synthetic Datasets

Cited by 6 publications

References 32 publications

Using synthetic data to improve the reproducibility of statistical results in psychological research.

Using synthetic data to improve the reproducibility of statistical results in psychological research.

Editorial of the Psych Special Issue “Computational Aspects, Statistical Algorithms and Software in Psychometrics”

Informed strategies for multivariate missing data

Contact Info

Product

Resources

About