Unsupervised extraction of stable expression signatures from public compendia with eADAGE

Tan, Jie; Doing, Georgia; Lewis, KA; Price, Courtney E; Chen, Kathleen M.; Cady, K.B.; Perchuk, Barret; Laub, Michael T.; Hogan, Deborah A.; Greene, Casey S.

doi:10.1101/078659

Cited by 9 publications

(20 citation statements)

References 47 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to the input dataset to be analyzed, ADAGE signature analysis also requires an ADAGE model and the gene expression compendium of an organism from which the model was built. The P. aeruginosa compendium and (e)ADAGE models can be built following instructions in [9,12].…”

Section: Adage Signature Analysis Workflowmentioning

confidence: 99%

“…The concept of ADAGE signature was first introduced in [12]. To recap, in an ADAGE model, genes connect to nodes via weights and this vector of weights characterizes each node (Figure 2).…”

Section: Active Signature Detectionmentioning

confidence: 99%

“…We provide an R package, intended for computationally inclined users and a web server intended for those without familiarity with the R programming language. The R package and the web server are both preloaded with a Pseudomonas aeruginosa gene expression compendium containing microarray samples measured on the Pae_G1a Affymetrix Pseudomonas aeruginosa array that were available on the ArrayExpress database [18] before July 31 2015, a previously published eADAGE model built on this compendium [12], and P.a. gene information retrieved from NCBI's ftp site.…”

Section: User Interfacementioning

confidence: 99%

“…those robust to noise. Our analysis of the genes that most influence each node previously revealed that they form gene sets that resemble human-annotated biological processes and pathways, which often exhibit consistent coexpression in large gene expression compendia [9,12]. We have termed such gene sets ADAGE signatures.…”

Section: Introductionmentioning

confidence: 99%

“…We have termed such gene sets ADAGE signatures. We developed eADAGE, which summarizes multiple ADAGE models into an ensemble model, to more robustly capture pathways and found that it covered significantly more biological pathways more precisely [12]. In addition to signatures that match curated pathways, eADAGE also extracts signatures that group genes that match known but uncurated pathways and others that may represent undiscovered biological processes.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

ADAGE signature analysis: differential expression analysis with data-defined gene sets

Tan

Huyck

et al. 2017

Preprint

Self Cite

View full text Add to dashboard Cite

Background: Gene set enrichment analysis and overrepresentation analyses are commonly used methods to determine the biological processes affected by a differential expression experiment. This approach requires biologically relevant gene sets, which are currently curated manually, limiting their availability and accuracy in many organisms without extensively curated resources. New feature learning approaches can now be paired with existing data collections to directly extract functional gene sets from big data. Results: Here we introduce a method to identify perturbed processes. In contrast with methods that use curated gene sets, this approach uses signatures extracted from public expression data. We first extract expression signatures from public data using ADAGE, a neural network-based feature extraction approach. We next identify signatures that are differentially active under a given treatment. Our results demonstrate that these signatures represent biological processes that are perturbed by the experiment. Because these signatures are directly learned from data without supervision, they can identify uncurated or novel biological processes. We implemented ADAGE signature analysis for the bacterial pathogen Pseudomonas aeruginosa. For the convenience of different user groups, we implemented both an R package (ADAGEpath) and a web server (http://adage.greenelab.com) to run these analyses. Both are open-source to allow easy expansion to other organisms or signature generation methods. We applied ADAGE signature analysis to an example dataset in which wild-type and Δanr mutant cells were grown as biofilms on the Cystic Fibrosis genotype bronchial epithelial cells. We mapped active signatures in the dataset to KEGG pathways and compared with pathways identified using GSEA. The two approaches generally return consistent results; however, ADAGE signature analysis also identified a signature that revealed the molecularly supported link between the MexT regulon and Anr. Conclusions: We designed ADAGE signature analysis to perform gene set analysis using data-defined functional gene signatures. This approach addresses an important gap for biologists studying non-traditional model organisms and those without extensive curated resources available. We built both an R package and web server to provide ADAGE signature analysis to the community.

show abstract

Section: Adage Signature Analysis Workflowmentioning

confidence: 99%

“…The concept of ADAGE signature was first introduced in [12]. To recap, in an ADAGE model, genes connect to nodes via weights and this vector of weights characterizes each node (Figure 2).…”

Section: Active Signature Detectionmentioning

confidence: 99%

Section: User Interfacementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

ADAGE signature analysis: differential expression analysis with data-defined gene sets

Tan

Huyck

et al. 2017

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Toxicogenomics: A 2020 Vision

Liu

Huang

Roberts

et al. 2019

Trends in Pharmacological Sciences

101

View full text Add to dashboard Cite

Toxicogenomics (TGx) has contributed significantly to toxicology and now has great potential to support moves towards animal-free approaches in regulatory decision making. Here, we discuss in vitro TGx systems and their potential impact on risk assessment. We raise awareness of the rapid advancement of genomics technologies, which generates novel genomics features essential for enhanced risk assessment. We specifically emphasize the importance of reproducibility in utilizing TGx in the regulatory setting. We also highlight the role of machine learning (particularly deep learning) in developing TGx-based predictive models. Lastly, we touch on the topics of how TGx approaches could facilitate adverse outcome pathways (AOP) development and enhance readacross strategies to further regulatory application. Finally, we summarize current efforts to develop TGx for risk assessment and set out remaining challenges. Toxicogenomics in Regulatory Application: Challenges and OpportunitiesAnimal models are used to assess and avoid risk to humans from exposure to potential hazards, but their use is under constant review, especially in the light of some reports of poor extrapolation for complex endpoints, such as hepatotoxicity and carcinogenicity. Consequently, 21st century toxicology emphasizes alternative means of risk assessment and the promotion of the 3Rs (replacement, reduction, and refinement of animals in toxicology testing) [1]. In Europe, great efforts have been made to advance the 3Rs with the aim of developing animal-free risk assessment methodologies. To this end, several high-profile programs are underway, such as the Framework Programme 7 (FP7), Horizon 2020, and some publicprivate partnerships, including Safety Evaluation Ultimately Replacing Animal Testing (SEURAT-1) and the Innovative Medicines Initiative (IMI). Furthermore, a series of EU Legislative directives have been developed and implemented over the past three decades, with an emphasis on moving away from animal testing; since 2013, animal models have been prohibited for testing cosmetics or household products in the EU, as well as in Israel and India [2]. In the US, government-initiated efforts comprise advanced regulatory sciences proposed by the US FDA [3] and Tox21 [4] [which involves four government agencies, including the Environmental Protection Agency (EPA), National Center for Advancing Translational Sciences, National Institute of Environmental Health Sciences, and the FDA] and ToxCast [5] (by the EPA). These ongoing efforts actively advocate and promote in silico and in vitro approaches, including toxicogenomics (TGx) (see Glossary), for prioritization and also for a potential application in risk assessment.TGx, as a subdiscipline of toxicology, has been successfully implemented to address critical issues and questions in a broad spectrum of toxicology. The rapid advancement of nextgeneration sequencing (NGS) technologies has gained traction in clinical application, particularly in personalized cancer diagnosis and prognosis, offering great op...

show abstract

Machine-Learned Molecular Surface and Its Application to Implicit Solvent Simulations

Wei

Zhao

Luo

2021

J. Chem. Theory Comput.

View full text Add to dashboard Cite

Implicit solvent models, such as Poisson-Boltzmann models, play important roles in computational studies of biomolecules. A vital step in almost all implicit solvent models is to determine the solvent-solute interface, and the solvent excluded surface (SES) is the most widely used interface definition in these models. However, classical algorithms used for computing SES are geometry-based, thus neither suitable for parallel implementations nor convenient for obtaining surface derivatives. To address the limitations, we explored a machine learning strategy to obtain a level-set formulation for the SES. The training process was conducted in three steps, eventually leading to a model with over 95% agreement with the classical SES. Visualization of tested molecular surfaces shows that the machine-learned SES overlaps with the classical SES on almost all situations. We also implemented the machinelearned SES into the Amber/PBSA program to study its performance on reaction field energy calculation. The analysis shows that the two sets of reaction field energies are highly consistent with 1% deviation on average. Given its level-set formulation, we expect the machine-learned SES to be applied in molecular simulations that require either surface derivatives or high efficiency on parallel computing platforms.

show abstract

Unsupervised extraction of stable expression signatures from public compendia with eADAGE

Abstract: Abstract

Cited by 9 publications

References 47 publications

ADAGE signature analysis: differential expression analysis with data-defined gene sets

ADAGE signature analysis: differential expression analysis with data-defined gene sets

Toxicogenomics: A 2020 Vision

Machine-Learned Molecular Surface and Its Application to Implicit Solvent Simulations

Contact Info

Product

Resources

About