Comparison of location-scale and matrix factorization batch effect removal methods on gene expression datasets

Renard, Emilie; Absil, Pierre-Antoine

doi:10.1109/bibm.2017.8217888

The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2017

DOI: 10.1109/bibm.2017.8217888

|View full text |Cite

Comparison of location-scale and matrix factorization batch effect removal methods on gene expression datasets

Emilie Renard

Pierre-Antoine Absil

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2020

2023

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

(2 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, instead of using QCs, well-known methods are introduced from other omics areas, particularly genomics, that can remove batch effects from subject samples directly. 11 They can be classified into two main approaches: location-scale methods and matrix factorization methods. Location-scale methods assume a model for data distribution within a batch, and adjust the data within each batch to fit this model.…”

mentioning

confidence: 99%

“…Overfitting results from the small number of QCs for the training models and cannot be avoided. Therefore, instead of using QCs, well-known methods are introduced from other omics areas, particularly genomics, that can remove batch effects from subject samples directly . They can be classified into two main approaches: location-scale methods and matrix factorization methods.…”

mentioning

confidence: 99%

See 1 more Smart Citation

NormAE: Deep Adversarial Learning Model to Remove Batch Effects in Liquid Chromatography Mass Spectrometry-Based Metabolomics Data

et al. 2020

View full text Add to dashboard Cite

Untargeted metabolomics based on liquid chromatography−mass spectrometry is affected by nonlinear batch effects, which cover up biological effects, result in nonreproducibility, and are difficult to be calibrate. In this study, we propose a novel deep learning model, called Normalization Autoencoder (NormAE), which is based on nonlinear autoencoders (AEs) and adversarial learning. An additional classifier and ranker are trained to provide adversarial regularization during the training of the AE model, latent representations are extracted by the encoder, and then the decoder reconstructs the data without batch effects. The NormAE method was tested on two real metabolomics data sets. After calibration by NormAE, the quality control samples (QCs) for both data sets gathered most closely in a PCA score plot (average distances decreased from 56.550 and 52.476 to 7.383 and 14.075, respectively) and obtained the highest average correlation coefficients (from 0.873 and 0.907 to 0.997 for both). Additionally, NormAE significantly improved biomarker discovery (median number of differential peaks increased from 322 and 466 to 1140 and 1622, respectively). NormAE was compared with four commonly used batch effect removal methods. The results demonstrated that using NormAE produces the best calibration results.

show abstract

mentioning

confidence: 99%

mentioning

confidence: 99%

NormAE: Deep Adversarial Learning Model to Remove Batch Effects in Liquid Chromatography Mass Spectrometry-Based Metabolomics Data

et al. 2020

View full text Add to dashboard Cite

show abstract

DBnorm as an R package for the comparison and selection of appropriate statistical methods for batch effect correction in metabolomic studies

Bararpour

Gilardi

Carmeli

et al. 2021

Sci Rep

View full text Add to dashboard Cite

As a powerful phenotyping technology, metabolomics provides new opportunities in biomarker discovery through metabolome-wide association studies (MWAS) and the identification of metabolites having a regulatory effect in various biological processes. While mass spectrometry-based (MS) metabolomics assays are endowed with high throughput and sensitivity, MWAS are doomed to long-term data acquisition generating an overtime-analytical signal drift that can hinder the uncovering of real biologically relevant changes. We developed “dbnorm”, a package in the R environment, which allows for an easy comparison of the model performance of advanced statistical tools commonly used in metabolomics to remove batch effects from large metabolomics datasets. “dbnorm” integrates advanced statistical tools to inspect the dataset structure not only at the macroscopic (sample batches) scale, but also at the microscopic (metabolic features) level. To compare the model performance on data correction, “dbnorm” assigns a score that help users identify the best fitting model for each dataset. In this study, we applied “dbnorm” to two large-scale metabolomics datasets as a proof of concept. We demonstrate that “dbnorm” allows for the accurate selection of the most appropriate statistical tool to efficiently remove the overtime signal drift and to focus on the relevant biological components of complex datasets.

show abstract

A novel approach to risk exposure and epigenetics—the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health

Ng,

Felix,

Olson

2023

BMC Med

View full text Add to dashboard Cite

Background Each mother–child dyad represents a unique combination of genetic and environmental factors. This constellation of variables impacts the expression of countless genes. Numerous studies have uncovered changes in DNA methylation (DNAm), a form of epigenetic regulation, in offspring related to maternal risk factors. How these changes work together to link maternal-child risks to childhood cardiometabolic and neurocognitive traits remains unknown. This question is a key research priority as such traits predispose to future non-communicable diseases (NCDs). We propose viewing risk and the genome through a multidimensional lens to identify common DNAm patterns shared among diverse risk profiles. Methods We identified multifactorial Maternal Risk Profiles (MRPs) generated from population-based data (n = 15,454, Avon Longitudinal Study of Parents and Children (ALSPAC)). Using cord blood HumanMethylation450 BeadChip data, we identified genome-wide patterns of DNAm that co-vary with these MRPs. We tested the prospective relation of these DNAm patterns (n = 914) to future outcomes using decision tree analysis. We then tested the reproducibility of these patterns in (1) DNAm data at age 7 and 17 years within the same cohort (n = 973 and 974, respectively) and (2) cord DNAm in an independent cohort, the Generation R Study (n = 686). Results We identified twenty MRP-related DNAm patterns at birth in ALSPAC. Four were prospectively related to cardiometabolic and/or neurocognitive childhood outcomes. These patterns were replicated in DNAm data from blood collected at later ages. Three of these patterns were externally validated in cord DNAm data in Generation R. Compared to previous literature, DNAm patterns exhibited novel spatial distribution across the genome that intersects with chromatin functional and tissue-specific signatures. Conclusions To our knowledge, we are the first to leverage multifactorial population-wide data to detect patterns of variability in DNAm. This context-based approach decreases biases stemming from overreliance on specific samples or variables. We discovered molecular patterns demonstrating prospective and replicable relations to complex traits. Moreover, results suggest that patterns harbour a genome-wide organisation specific to chromatin regulation and target tissues. These preliminary findings warrant further investigation to better reflect the reality of human context in molecular studies of NCDs. Graphical Abstract

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Comparison of location-scale and matrix factorization batch effect removal methods on gene expression datasets

Cited by 3 publications

References 22 publications

NormAE: Deep Adversarial Learning Model to Remove Batch Effects in Liquid Chromatography Mass Spectrometry-Based Metabolomics Data

NormAE: Deep Adversarial Learning Model to Remove Batch Effects in Liquid Chromatography Mass Spectrometry-Based Metabolomics Data

DBnorm as an R package for the comparison and selection of appropriate statistical methods for batch effect correction in metabolomic studies

A novel approach to risk exposure and epigenetics—the use of multidimensional context to gain insights into the early origins of cardiometabolic and neurocognitive health

Contact Info

Product

Resources

About