An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray

Salas, Lucas A.; Koestler, Devin C.; Butler, Rondi A.; Hansen, Helen M.; Wiencke, John K.; Kelsey, Karl T.; Christensen, Brock C.

doi:10.1186/s13059-018-1448-7

Cited by 283 publications

(336 citation statements)

References 45 publications

Supporting

Mentioning

331

Contrasting

Order By: Relevance

“…Following fertilization, DNA methylation is erased and reestablished in concert with lineage commitment and cellular differentiation (Lee et al 2014). As lineage specific marks of DNA methylation have been successfully employed to detect the relative abundance of individual cell types in blood mixtures (Houseman et al 2012;Accomando et al 2014;Koestler et al 2016;Salas et al 2018) and because a significant proportion of progenitor and stem cell methylation events are mitotically stable throughout differentiation, it is possible that a common set of unchanging DNA methylation markers can trace a common cell ontogeny . Here, we describe a novel analytical pipeline that involves generating a library of stable CpG loci that are markers of the cell of origin for studying peripheral blood leukocytes.…”

Section: Introductionmentioning

confidence: 99%

Tracing human stem cell lineage during development using DNA methylation

et al. 2018

Self Cite

View full text Add to dashboard Cite

Abstract:Stem cell maturation is a fundamental, yet poorly understood aspect of human development. We devised a DNA methylation signature deeply reminiscent of embryonal stem cells (a fetal cell origin signature, FCO) to interrogate the evolving character of multiple human tissues. The cell fraction displaying this FCO signature was highly dependent upon developmental stage (fetal vs adult), and in leukocytes, it described a dynamic transition during the first 5 years of life. Significant individual variation in the FCO signature of leukocytes was evident at birth, in childhood, and throughout adult life. The genes characterizing the signature included transcription factors and proteins intimately involved in embryonic development. We defined and applied a DNA methylation signature common among human fetal hematopoietic progenitor cells, and have shown that this signature traces the lineage of cells and informs the study of stem cell heterogeneity in humans under homeostatic conditions.

show abstract

Section: Introductionmentioning

confidence: 99%

Tracing human stem cell lineage during development using DNA methylation

et al. 2018

Self Cite

View full text Add to dashboard Cite

show abstract

“…These datatypes are used to handle complex preprocessing calculations, while being abstracted away via a convenient command-line interface. Additional commands, such as the removal of nonautosomal sites, SNP removal (either via QC methods or post-QC by subsetting CpGs that are not in a list of CpGs supplied by meffil for the respective array platform), and reference-based cell-type estimation (constrained projection/quadratic programming) (Houseman et al, 2012;Jaffe and Irizarry, 2014;Salas et al, 2018), and class methods are available in the help documentation. In addition, a visualization module generates interactive 3-D representations of the data using UMAP and Plotly (Modern Analytic Apps for the Enterprise) for further inspection.…”

Section: Methodsmentioning

confidence: 99%

PyMethylProcess - highly parallelized preprocessing for DNA methylation array data

Levy

Titus

Salas

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

The ability to perform high-throughput preprocessing of methylation array data is essential in large scale methylation studies. While R is a convenient language for methylation analyses, performing highly parallelized preprocessing using Python can accelerate data preparation for downstream methylation analyses, including large scale production-ready machine learning pipelines. Here, we present a methylation data preprocessing pipeline called PyMethylProcess that is highly reproducible, scalable, and that can be quickly set-up and deployed through Docker and PIP.

show abstract

“…Methylated regions of DNA (hypermethylated), are associated with III condensed chromatin, and when present near gene promoters, repression of transcription. cell-type specific, EWAS often account for potential confounding from variation in biospecimen cell composition using reference-based, or reference-free approaches to infer cell type proportions [9][10][11][12] .…”

Section: Introductionmentioning

confidence: 99%

“…The one thousand most important CpGs from each group were extracted and overlapped with CpGs defined by the Hannum model to depict the concordance of important CpGs between MethylNet and the Hannum model.XXVIIIFor a second task, MethylNet was configured for multi-target regression to estimate cell-type proportions. First, estimateCellCounts2, using the 450K legacy IDOL optimized library11 , was used to deconvolve the cell-type proportions from each sample to develop our best proxy to ground truth outcomes for training the model. The MethylNet model was trained on the estimateCellCounts2 estimates of cell-type proportions for six different immune cell-types.MethylNet was then compared to results derived from applying the 350 IDOL derived CpGs legacy library from FlowSorted.Blood.EPIC53 using two different deconvolution methods Robust Partial Correlations (RPC) and Cibersort implemented in EpiDISH 54 .…”

mentioning

confidence: 99%

MethylNet: An Automated and Modular Deep Learning Approach for DNA Methylation Analysis

Levy

Titus

Petersen

et al. 2019

Preprint

Self Cite

View full text Add to dashboard Cite

DNA methylation (DNAm) is an epigenetic regulator of gene expression programs that can be altered by environmental exposures, aging, and in pathogenesis. Traditional analyses that associate DNAm alterations with phenotypes suffer from multiple hypothesis testing and multicollinearity due to the high-dimensional, continuous, interacting and non-linear nature of the data. Deep learning analyses have shown much promise to study disease heterogeneity.DNAm deep learning approaches have not yet been formalized into user-friendly frameworks for execution, training, and interpreting models. Here, we have developed MethylNet to make predictions, generate new data, and uncover unknown heterogeneity with minimal user supervision. The results of our experiments indicate that MethylNet can study cellular differences, grasp higher order information of cancer sub-types, estimate age and capture factors associated with smoking in concordance with known differences. The ability of MethylNet to capture nonlinear interactions presents an opportunity for further study of unknown disease, cellular heterogeneity and aging processes. challenges, many downstream EWAS analyses have focused on reducing the dimensions into a rich feature set to associate with outcomes. By limiting the number of features through dimensionality reduction and feature selection, analyses become more computationally tractable and the burden of correcting for multiple comparisons is reduced.An important advancement to methylation-based deep learning analyses was the application of Variational Auto-encoders (VAE). Initial deep learning approaches for DNAm data focused on estimating methylation status and imputation, performing classification and regression tasks, and performing embeddings of CpG methylation states to extract biologically meaningful lower-dimensional features [15][16][17][18][19][20][21][22] . VAEs embed the methylation profiles in a way that represents the original data with high fidelity while revealing nuances 4,5,23 . Thereafter, researchers attempted to develop similar frameworks for extracting features for downstream prediction tasks and identify meaningful relationships revealed by VAE latent representations 24 . However, VAE models are sensitive to the selection of hyperparameters 25 and have not been optimized for synthetic data generation, latent space exploration, and prediction tasks. Many autoencoder approaches represent the data using an encoder, and then utilize a non-neural network model (e.g. support vector machine) to finalize the predictions. Presently, to the best of our knowledge there is no end-to-end training approach that both extracts biologically meaningful features through latent encoding and performs predictions using the derived features. Further, existing frameworks do not output predictions for multi-target regression tasks, such as cell-type deconvolution and subject age prediction.Here, we leverage deep learning latent space regression and classification tasks through the development of a modular framework that is ...

show abstract

An optimized library for reference-based deconvolution of whole-blood biospecimens assayed using the Illumina HumanMethylationEPIC BeadArray

Cited by 283 publications

References 45 publications

Tracing human stem cell lineage during development using DNA methylation

Tracing human stem cell lineage during development using DNA methylation

PyMethylProcess - highly parallelized preprocessing for DNA methylation array data

MethylNet: An Automated and Modular Deep Learning Approach for DNA Methylation Analysis

Contact Info

Product

Resources

About