Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors.
Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin slides from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995±0.008), as well as subtypes with lower but significant accuracy (AUC 0.87±0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88±0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with average tile-level correlation of 0.45±0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self-and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial biology.
The current standard of care for many patients with HER2-positive breast cancer is neoadjuvant chemotherapy in combination with anti-HER2 agents, based on HER2 amplification as detected by in situ hybridization (ISH) or protein immunohistochemistry (IHC). However, hematoxylin & eosin (H&E) tumor stains are more commonly available, and accurate prediction of HER2 status and anti-HER2 treatment response from H&E would reduce costs and increase the speed of treatment selection. Computational algorithms for H&E have been effective in predicting a variety of cancer features and clinical outcomes, including moderate success in predicting HER2 status. In this work, we present a novel convolutional neural network (CNN) approach able to predict HER2 status with increased accuracy over prior methods. We trained a CNN classifier on 188 H&E whole slide images (WSIs) manually annotated for tumor regions of interest (ROIs) by our pathology team. Our classifier achieved an area under the curve (AUC) of 0.90 in crossvalidation of slide-level HER2 status and 0.81 on an independent TCGA test set. Within slides, we observed strong agreement between pathologist annotated ROIs and blinded computational predictions of tumor regions / HER2 status. Moreover, we trained our classifier on pre-treatment samples from 187 HER2+ patients that subsequently received trastuzumab therapy. Our classifier achieved an AUC of 0.80 in a five-fold cross validation. Our work provides an H&E-based algorithm that can predict HER2 status and trastuzumab response in breast cancer at an accuracy that is better than IHC and may benefit clinical evaluations..
Background Adverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health care. Understanding which drug targets are linked to ADRs can lead to the development of safer medicines. Methods Here, we analyse in vitro secondary pharmacology of common (off) targets for 2134 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event Reports and developed random forest models that predict ADR occurrences from in vitro pharmacological profiles. Findings By evaluating Gini importance scores of model features, we identify 221 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected by chance. Amongst these are established relations, such as the association of in vitro hERG binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence on bile acid metabolism supports our identification of associations between the Bile Salt Export Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. Interpretation These associations provide a comprehensive resource to support drug development and human biology studies. Funding This study was not supported by any formal funding bodies.
Histopathological images are an integral data type for studying cancer. We show pre-trained convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNNs with a common architecture trained on 19 cancer types of The Cancer Genome Atlas (TCGA), analyzing 14459 hematoxylin and eosin scanned frozen tissue images. Our CNNs are based on the Inception-V3 network and classify TCGA pathologist-annotated tumor/normal status of whole slide images in all 19 cancer types with consistently high AUCs (0.995±0.008). Remarkably, CNNs trained on one tissue are effective in others (AUC 0.88±0.11), with classifier relationships recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45±0.16 between classifier pairs on the TCGA test sets. In particular, the TCGA-trained classifiers had average tile-level correlation of 0.52±0.09 and 0.58±0.08 on hold-out TCGA lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) test sets, respectively. These relations are reflected on two external datasets, i.e., LUAD and LUSC whole slide images of Clinical Proteomic Tumor Analysis Consortium. The CNNs trained on TCGA achieved cross-classification AUCs of 0.75±0.12 and 0.73±0.13 on LUAD and LUSC external validation sets, respectively. These CNNs had average tile-level correlations of 0.38±0.09 and 0.39±0.08 on LUAD and LUSC validation sets, respectively. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. This study illustrates pre-trained CNNs can detect tumor features across a wide range of cancers, suggesting presence of pan-cancer tumor features. These shared features allow combining datasets when analyzing small samples to narrow down the parameter search space of CNN models. Citation Format: Javad Noorbakhsh, Saman Farahmand, Ali Foroughi pour, Sandeep Namburi, Dennis Caruana, David Rimm, Mohammad Soltanieh-ha, Kourosh Zarringhalam, Jeffrey H. Chuang. Deep learning identifies conserved pan-cancer tumor features [abstract]. In: Proceedings of the AACR Virtual Special Conference on Artificial Intelligence, Diagnosis, and Imaging; 2021 Jan 13-14. Philadelphia (PA): AACR; Clin Cancer Res 2021;27(5_Suppl):Abstract nr PO-003.
Inference of active regulatory mechanisms underlying specific molecular and environmental perturbations is essential for understanding cellular response. The success of inference algorithms relies on the quality and coverage of the underlying network of regulator–gene interactions. Several commercial platforms provide large and manually curated regulatory networks and functionality to perform inference on these networks. Adaptation of such platforms for open-source academic applications has been hindered by the lack of availability of accurate, high-coverage networks of regulatory interactions and integration of efficient causal inference algorithms. In this work, we present CIE, an integrated platform for causal inference of active regulatory mechanisms form differential gene expression data. Using a regularized Gaussian Graphical Model, we construct a transcriptional regulatory network by integrating publicly available ChIP-seq experiments with gene-expression data from tissue-specific RNA-seq experiments. Our GGM approach identifies high confidence transcription factor (TF)–gene interactions and annotates the interactions with information on mode of regulation (activation vs. repression). Benchmarks against manually curated databases of TF–gene interactions show that our method can accurately detect mode of regulation. We demonstrate the ability of our platform to identify active transcriptional regulators by using controlled in vitro overexpression and stem-cell differentiation studies and utilize our method to investigate transcriptional mechanisms of fibroblast phenotypic plasticity.
24Adverse drug reactions (ADRs) are one of the leading causes of morbidity and mortality in health 25 care. Understanding which drug targets are linked to ADRs can lead to the development of safer 26 medicines. Here, we analyze in vitro secondary pharmacology of common (off) targets for 2134 27 marketed drugs. To associate these drugs with human ADRs, we utilized FDA Adverse Event 28Reports and developed random forest models that predict ADR occurrences from in vitro 29 pharmacological profiles. By evaluating Gini importance scores of model features, we identify 221 30 target-ADR associations, which co-occur in PubMed abstracts to a greater extent than expected 31 by chance. Among these are established relations, such as the association of in vitro hERG 32 binding with cardiac arrhythmias, which further validate our machine learning approach. Evidence 33 on bile acid metabolism supports our identification of associations between the Bile Salt Export 34Pump and renal, thyroid, lipid metabolism, respiratory tract and central nervous system disorders. 35 Unexpectedly, our model suggests PDE3 is associated with 40 ADRs. These associations provide 36 a comprehensive resource to support drug development and human biology studies. 37Keywords 38 Adverse drug reactions, adverse event report, FAERS, secondary pharmacology, machine 39 learning, statistical modeling, drug discovery & development, drug safety. 40 occurrences of drugs but most importantly also extract biologically meaningful target-ADR links. 75Using an in vitro secondary pharmacology database of more than 2,000 marketed or withdrawn 76 drugs (see Methods), we built a random forest model to predict drug-ADR and target-ADR 77 associations. We validate drug-ADR predictions through systematic Side Effect Resource 78 (SIDER) drug label analysis and 221 target-ADR predictions through systematic literature co-79 occurrence analysis. Furthermore, we find canonical target-ADR associations, such as hERG 80 binding causing cardiac arrhythmias. We also encountered unexpected associations which 81 warrant further investigations, such as a link between Phosphodiesterase 3 (PDE3) and several 82ADRs, including congenital renal and urinary tract disorders. We conclude our study with potential 83 targets that are associated with cardiovascular and renal ADRs to demonstrate the utility and 84 possible impact of this method in drug development and preclinical safety sciences by enabling 85 prediction of ADRs from in vitro pharmacological profiles. 86
Inference of active regulatory mechanisms underlying specific molecular and environmental perturbations is essential for understanding cellular response. The success of inference algorithms relies on the quality and coverage of the underlying network of regulator-gene interactions. Several commercial platforms provide large and manuallycurated regulatory networks and functionality to perform inference on these networks. Adaptation of such platforms for open-source academic applications has been hindered by the lack of availability of accurate, high-coverage networks of regulatory interactions and integration of efficient causal inference algorithms. In this work, we present CIE, an integrated platform for causal inference of active regulatory mechanisms form differential gene expression data. Using a regularized Gaussian Graphical Model, we construct a transcriptional regulatory network by integrating publicly available ChIP-Seq experiments with gene-expression data from tissue-specific RNA-Seq experiments. Our GGM approach identifies high confidence TF-gene interactions and annotates the interactions with information on mode of regulation (activation vs. repression). Benchmarks against manually-curated databases of TF-gene interactions show that our method can accurately detect mode of regulation. We demonstrate the ability of our platform to identify active transcriptional regulators by using controlled in vitro overexpression and stem-cell differentiation studies and utilize our method to investigate transcriptional mechanisms of fibroblast phenotypic plasticity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.