Key points• Pan-cancer computational histopathology analysis with deep learning extracts histopathological patterns and accurately discriminates 28 cancer and 14 normal tissue types • Computational histopathology predicts whole genome duplications, focal amplifications and deletions, as well as driver gene mutations • Wide-spread correlations with gene expression indicative of immune infiltration and proliferation • Prognostic information augments conventional grading and histopathology subtyping in the majority of cancers AbstractHere we use deep transfer learning to quantify histopathological patterns across 17,396 H&E stained histopathology image slides from 28 cancer types and correlate these with underlying genomic and transcriptomic data. Pan-cancer computational histopathology (PC-CHiP) classifies the tissue origin across organ sites and provides highly accurate, spatially resolved tumor and normal distinction within a given slide. The learned computational histopathological features correlate with a large range of recurrent genetic aberrations, including whole genome duplications (WGDs), arm-level copy number gains and losses, focal amplifications and deletions as well as driver gene mutations within a range of cancer types. WGDs can be predicted in 25/27 cancer types (mean AUC=0.79) including those that were not part of model training. Similarly, we observe associations with 25% of mRNA transcript levels, which enables to learn and localise histopathological patterns of molecularly defined cell types on each slide. Lastly, we find that computational histopathology provides prognostic information augmenting histopathological subtyping and grading in the majority of cancers assessed, which pinpoints prognostically relevant areas such as necrosis or infiltrating lymphocytes on each tumour section. Taken together, these findings highlight the large potential of PC-CHiP to discover new molecular and prognostic associations, which can augment diagnostic workflows and lay out a rationale for integrating molecular and histopathological data.
Despite regional successes in controlling the SARS-CoV-2 pandemic, global cases have reached an all time high in April 2021 in part due to the evolution of more transmissible variants. Here we use the dense genomic surveillance generated by the COVID-19 Genomics UK Consortium to reconstruct the dynamics of 62 different lineages in each of 315 English local authorities between September 2020 and April 2021. This analysis reveals a series of sub-epidemics that peaked in the early autumn of 2020, followed by a singular jump in transmissibility of the B.1.1.7 lineage. B.1.1.7 grew when other lineages declined during the second national lockdown and regionally tiered restrictions between November and December 2020. A third more stringent national lockdown eventually suppressed B.1.1.7 and eliminated nearly all other lineages in early 2021. However, a series of variants (mostly containing the spike E484K mutation) defied these trends and persisted at moderately increasing proportions. Accounting for sustained introductions, however, indicates that their transmissibility is unlikely to exceed that of B.1.1.7. Finally, B.1.617.2 was repeatedly introduced to England and grew rapidly in April 2021, constituting approximately 40% of sampled COVID-19 genomes on May 15.
This is a PDF file of a peer-reviewed paper that has been accepted for publication. Although unedited, the content has been subjected to preliminary formatting. Nature is providing this early version of the typeset paper as a service to our authors and readers. The text and figures will undergo copyediting and a proof review before the paper is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers apply.
Background Health care is experiencing a drive towards digitisation and many countries are implementing national health data resources. Digital medicine promises to identify individuals at elevated risk of disease who may benefit from screening or interventions. This is particularly needed for cancer where early detection improves outcomes. While a range of cancer risk models exists, the utility of population-wide electronic health databases for risk stratification across cancer types has not been fully explored. Methods We use time-dependent Bayesian Cox Hazard models built on modern machine learning frameworks to scale the statistical approach to 6.7 million Danish individuals covering 193 million life-years over a period from 1978-2015. A set of 1,392 covariates from available clinical disease trajectories, text-mined basic health factors and family histories are used to train predictive models of 20 major cancer types. The models are validated on cancer incidence between 2015-2018 across Denmark and on 0.35 million individuals in the UK Biobank. Findings The predictive performance of models was found to exceed age-sex-based predictions in all but one cancer type. Models trained on Danish data perform similarly on the UK Biobank in a direct transfer without any additional retraining. Cancer risks are associated, in addition to heritable components, with a broad range of preceding diagnoses and health factors. The best overall performance was seen for cancers of the digestive system but also Thyroid, Kidney and Uterine Cancers. Risk-adapted cohorts may on average include 25% individuals younger than age-sex-based cohorts with similar incidence. Interpretation Data available in national electronic health databases can be used to approximate cancer risk factors and enable risk predictions in most cancer types. Model predictions generalise between the Danish and UK health care systems and may help to enable cancer screening in younger age groups. Funding Novo Nordisk Foundation.
The Cox model is an indispensable tool for time-to-event analysis, particularly in biomedical research. However, medicine is undergoing a profound transformation, generating data at an unprecedented scale, which opens new frontiers to study and understand diseases. With the wealth of data collected, new challenges for statistical inference arise, as datasets are often high dimensional, exhibit an increasing number of measurements at irregularly spaced time points, and are simply too large to fit in memory. Many current implementations for time-to-event analysis are ill-suited for these problems as inference is computationally demanding and requires access to the full data at once. Here we propose a Bayesian version for the counting process representation of Cox's partial likelihood for efficient inference on large-scale datasets with millions of data points and thousands of time-dependent covariates. Through the combination of stochastic variational inference and a reweighting of the log-likelihood, we obtain an approximation for the posterior distribution that factorizes over subsamples of the data, enabling the analysis in big data settings. Crucially, the method produces viable uncertainty estimates for large-scale and high-dimensional datasets. We show the utility of our method through a simulation study and an application to myocardial infarction in the UK Biobank.
BACKGROUND Integrative brain tumour diagnostics indisputably requires comprehensive reporting of molecular markers. The 2021 WHO classification of central nervous system (CNS) tumours substantially increased the set of markers for routine evaluation, with greater significance to DNA methylation analysis in diagnostics. Limited by investment and batching, smaller labs and clinics might suffer major delays in delivering clinical decisions. To make precision diagnostics accessible, we introduce an integrated computational histopathology and adaptive nanopore sequencing workflow for next day CNS tumour diagnostics. METHODS We used CNS-CHiP- a multitask deep transfer learning model to predict key molecular alterations and methylation classification from H&E stained CNS tumour slides. For further characterisation and subtyping, we used the predictions to formulate a custom panel for each patient. Targeted sequencing and analyses were performed using Rapid-CNS2- a custom neurooncology nanopore sequencing pipeline for parallel copy-number, mutational and methylation analysis that is flexible in target selection with no additional library preparation and can be initiated upon receipt of frozen sections. Sequencing was performed on a portable MinION or GridION. RESULTS We demonstrate our workflow on diagnostic samples received by the Department of Neuropathology, University Hospital Heidelberg. CNS-CHiP predicted multiple pathognomonic alterations (eg. IDH mutation, 7 gain/10 loss) with reasonable accuracy. This provided basic information regarding the tumor type instantly. Personalised panels enabled small target sizes, resulting in low sequencing time (up to 24h) and competitive costs. The GPU-accelerated bioinformatics pipeline reduced analysis time from > 24h to < 3h. CONCLUSIONS Our workflow harnessing histology-based molecular predictions to instruct targeted nanopore sequencing can be set up with low initial investment and has the potential to facilitate reporting of molecular results on the next day of sample collection. CNS-CHiP combined with Rapid-CNS2 thus aims to make CNS tumour diagnostics affordable and accessible to smaller hospitals and labs especially in low- and middle-income countries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.