Integrative learning of multiple datasets has the potential to mitigate the challenge of small n and large p that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can be enhanced by jointly selecting features for all datasets.However, the set of important features may not always be the same across all datasets. Although some existing integrative learning methods allow heterogeneous sparsity structure where a subset of datasets can have zero coefficients for some selected features, they tend to yield reduced efficiency, reinstating the problem of losing weak important signals. We propose a new integrative learning approach which can not only aggregate important signals well in homogeneous sparsity structure, but also substantially alleviate the problem of losing weak important signals in heterogeneous sparsity structure. Our approach exploits a priori known graphical structure of features and encourages joint selection of features that are connected in the graph. Integrating such prior information over multiple datasets enhances the power, while also accounting for the heterogeneity across datasets.Theoretical properties of the proposed method are investigated. We also demonstrate the limitations of existing approaches and the superiority of our method using a simulation study and analysis of gene expression data from ADNI.
Missing data are present in most real world problems and need careful handling to preserve the prediction accuracy and statistical consistency in the downstream analysis. As the gold standard of handling missing data, multiple imputation (MI) methods are proposed to account for the imputation uncertainty and provide proper statistical inference.In this work, we propose Multiple Imputation via Generative Adversarial Network (MI-GAN), a deep learning-based (in specific, a GAN-based) multiple imputation method, that can work under missing at random (MAR) mechanism with theoretical support. MI-GAN leverages recent progress in conditional generative adversarial neural works and shows strong performance matching existing state-of-the-art imputation methods on highdimensional datasets, in terms of imputation error. In particular, MI-GAN significantly outperforms other imputation methods in the sense of statistical inference and computational speed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.