Multi-omic studies in large cohorts promise to characterize biological processes across molecular layers including genome, transcriptome, epigenome, proteome and perturbation phenotypes.However, methods for integrating multi-omic datasets in an unsupervised manner are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in a multi-omics dataset. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability across data modalities, thereby enabling a variety of downstream analyses, including factor annotation, data imputation and the detection of outlier samples. We applied MOFA to a study of 200 patient samples of chronic lymphocytic leukemia (CLL) profiled for somatic mutations, RNA expression, DNA methylation and ex-vivo responses to a panel of 63 drugs. MOFA discovered known dimensions of disease heterogeneity, including immunoglobulin heavy chain variable region (IGHV) status and trisomy of chromosome 12, as well as previously underappreciated drivers of variation, such as response to oxidative stress. These . CC-BY 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The copyright holder for this preprint . http://dx.doi.org/10.1101/217554 doi: bioRxiv preprint first posted online Nov. 10, 2017; 2 learnt factors capture key dimensions of inter-patient heterogeneity and enhance prediction accuracy of clinical outcomes.