The rapidly increasing availability of microbial genome sequences has led to a growing demand for bioinformatics software tools that support the functional analysis based on the comparison of closely related genomes. By utilizing comparative approaches on gene level it is possible to gain insights into the core genes which represent the set of shared features for a set of organisms under study. Vice versa singleton genes can be identified to elucidate the specific properties of an individual genome. Since initial publication, the EDGAR platform has become one of the most established software tools in the field of comparative genomics. Over the last years, the software has been continuously improved and a large number of new analysis features have been added. For the new version, EDGAR 2.0, the gene orthology estimation approach was newly designed and completely re-implemented. Among other new features, EDGAR 2.0 provides extended phylogenetic analysis features like AAI (Average Amino Acid Identity) and ANI (Average Nucleotide Identity) matrices, genome set size statistics and modernized visualizations like interactive synteny plots or Venn diagrams. Thereby, the software supports a quick and user-friendly survey of evolutionary relationships between microbial genomes and simplifies the process of obtaining new biological insights into their differential gene content. All features are offered to the scientific community via a web-based and therefore platform-independent user interface, which allows easy browsing of precomputed datasets. The web server is accessible at http://edgar.computational.bio.
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. critically, the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Here, we compare standard, data-driven feature selection approaches to feature selection driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, evaluating 2484 unique models. For 23 drugs, better predictive performance is achieved when the features are selected according to prior knowledge of drug targets and pathways. the best correlation of observed and predicted response using the test set is achieved for Linifanib (r = 0.75). Extending the drug-dependent features with gene expression signatures yields the most predictive models for 60 drugs, with the best performing example of Dabrafenib. for many compounds, even a very small subset of drug-related features is highly predictive of drug sensitivity. Small feature sets selected using prior knowledge are more predictive for drugs targeting specific genes and pathways, while models with wider feature sets perform better for drugs affecting general cellular mechanisms. Appropriate feature selection strategies facilitate the development of interpretable models that are indicative for therapy design.
The deployment of next‐generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is feasible to analyze not only single genomes, but also large groups of related genomes in a comparative approach. Whole genome sequencing of type strain genomes also holds huge potential for obtaining a higher resolution phylogenetic and taxonomic classification. In the past 9 years, the EDGAR platform has become one of the most established software tools in the field of comparative genomics. During this time, the software has been continuously improved, and a large number of new analysis features have been added. In recent years, the use of EDGAR for core‐genome‐based phylogenomic/taxonomic analysis has become a main application field of the software. With a focus on generating genome sequences of all type strains of prokaryotic species, the basic 16S rRNA gene sequence phylogeny can be significantly extended to a higher resolution core‐genome‐based taxonomy, and lab work intensive DNA–DNA hybridization (DDH) can be replaced by genome‐sequence‐based indices, which reflect the species borders in the same manner as the DDH. The web‐based user interface of EDGAR offers all tools required for phylogenomic inter‐ and intraspecies taxonomic analyses as needed for the proposal of novel species. EDGAR calculates core‐genome‐based phylogenetic trees with neighbor‐joining and maximum‐likelihood methods as well as amino acid identity (AAI) and average nucleotide identity (ANI) matrices. Furthermore, it offers convenient visualization features such as Venn diagrams, synteny plots, and a comparative view of the genomic neighborhood of orthologous genes. Recently, the software was extended to include various new features, such as statistical analyses, replicon grouping options, and second‐level analyses of meta gene sets. Thus, the software enables a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. EDGAR also provides public databases with precomputed projects providing comparative genomics and phylogenomic results. The platform provides 322 genus‐based public databases comprising 8,079 complete genomes. Besides those genus‐based projects, in this article, we present 226 new public projects that are clustered on the family level and use type strains genomes, which also include draft genomes. These new public projects comprise a further 4,400 genomes. EDGAR is free for academic use and funded as a service by the German Network for Bioinformatics Infrastructure – de.NBI. EDGAR is available via the public web server http://edgar.computational.bio .
Drug sensitivity prediction constitutes one of the main challenges in personalized medicine. The major difficulty of this problem stems from the fact that the sensitivity of cancer cells to treatment depends on an unknown subset of a large number of biological features. Although feature selection is the key to interpretable results and identification of potential biomarkers, a comprehensive assessment of feature selection methods for drug sensitivity prediction has so far not been performed. We propose feature selection approaches driven by prior knowledge of drug targets, target pathways, and gene expression signatures. We asses these methodologies on Genomics of Drug Sensitivity in Cancer (GDSC) dataset, a panel of around 1000 cell lines screened against multiple anticancer compounds. We compare our results with a baseline model utilizing genome-wide gene expression features and common data-driven feature selection techniques. Together, 2484 unique models were evaluated, providing a comprehensive study of feature selection strategies for the drug response prediction problem. For 23 drugs, the models achieve better predictive performance when the features are selected according to prior knowledge of drug targets and pathways. The best correlation of observed and predicted response using the test set is achieved for Linifanib (r=0.75). Extending the drug-dependent features with gene expression signatures yields models that are most predictive of drug response for 60 drugs, with the best performing example of Dabrafenib. Examples of how pre-selection of features benefits the model interpretability are given for Dabrafenib, Linifanib and Quizartinib. Based on GDSC drug data, we find that feature selection driven by prior knowledge tends to yield better results for drugs targeting specific genes and pathways, while models with the genome-wide features perform better for drugs affecting general mechanisms such as metabolism and DNA replication. For a significant group of the compounds, even a very small number of features based on simple drug properties is often highly predictive of drug sensitivity, can explain the mechanism of drug action and be used as guidelines for their prescription. In general, choosing appropriate feature selection strategies has the potential to develop interpretable models that are indicative for therapy design. Pharmacogenomics | Machine learning | Personalized medicine
Gene expression signatures have proven their potential to characterize important cancer phenomena like oncogenic signaling pathway activities, cellular origins of tumors, or immune cell infiltration into tumor tissues. Large collections of expression signatures provide the basis for their application to data sets, but the applicability of each signature in a new experimental context must be reassessed. We apply a methodology that utilizes the previously developed concept of coherent expression of genes in signatures to identify translatable signatures before scoring their activity in single tumors. We present a web interface (www.rosettasx.com) that applies our methodology to expression data from the Cancer Cell Line Encyclopaedia and The Cancer Genome Atlas. Configurable heat maps visualize per-cancer signature scores for 293 hand-curated literature-derived gene sets representing a wide range of cancer-relevant transcriptional modules and phenomena. The platform allows users to complement heatmaps of signature scores with molecular information on SNVs, CNVs, gene expression, gene dependency, and protein abundance or to analyze own signatures. Clustered heatmaps and further plots to drill-down results support users in studying oncological processes in cancer subtypes, thereby providing a rich resource to explore how mechanisms of cancer interact with each other as demonstrated by exemplary analyses of 2 cancer types.
The Hippo pathway is not only important for control of organ size in animals, it has also emerged as an important regulator of cancer development and progression. A robust surrogate for pathway activity is of importance to characterize the role of this pathway within primary tumors or for distinct tumor models. Gene expression signatures have been frequently used as downstream integrators for pathway activity. To enable assessment of Hippo pathway activity in transcriptomic profiles of cancer patients and models, we developed a protocol to distill a multigene expression signature from gene expression data and gene dependency screens in cell lines. Based on Achilles and DRIVE shRNA data together with expression data from the CCLE cell line panel, we first developed a 16-gene signature that reflects association with YAP1/TEAD1 dependency and coexpression with YAP1. To focus only on direct targets of YAP/TAZ, we took the intersection of our 16-gene signature and the 22-gene YAP/TAZ target score by Wang et al., Cell Reports 2018. The resulting gene signature consists of the four genes AMOTL2, CRIM1, CYR61, and MYOF. We tested the robustness of this 4-gene Hippo signature on gene expression data from The Cancer Genome Atlas (TCGA). We found that the 4-gene Hippo signature delivers high coherence scores in nearly all cancer types of the TCGA cohort. For these indications, our analysis suggests that the 4 genes constitute a coherently regulated expression module that warrants further investigation. We developed a summary signature Hippo score to stratify patients or samples according to their pathway activity and compared this score with others from a collection of more than 250 gene expression signatures. We observed that the Hippo signature score is often highly correlated with several published signatures that inform epithelial mesenchymal transition or cancer cell stemness. A cohort enrichment analysis on the TCGA data identified those indications, where most of the patients of this cohort have high signature scores and where YAP/TAZ activity could be an oncogenic driver. To refine this set of indications further, we performed a survival analysis (both overall and progression-free). We selected only those indications where the prognosis for patients with high signature score is worse compared to patients with low signature score. This analysis revealed that in mesothelioma (MESO), ovarian serous cystadenocarcinoma (OV), pancreatic adenocarcinoma (PAAD), and brain lower-grade glioma (LGG), the Hippo pathway could be an oncogenic driver and may represent a promising target for future therapies. Our findings can support drug development by suggesting preclinical models (cell lines, PDX models), cancer indications, and even patients within indications by using our 4-gene signature as a predictive biomarker for oncogenic YAP/TAZ activity. Citation Format: Johanna Mazur, Julian Kreis, Elisabeth Trivier, Christian Dillon, Dirk Wienke, Eike Staub. A 4-gene YAP-related pathway expression signature informs about dependence of tumors on Hippo pathway signaling [abstract]. In: Proceedings of the AACR Special Conference on the Hippo Pathway: Signaling, Cancer, and Beyond; 2019 May 8-11; San Diego, CA. Philadelphia (PA): AACR; Mol Cancer Res 2020;18(8_Suppl):Abstract nr A38.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.