Identifying functional effects of noncoding variants is a major challenge in human genetics. To predict the noncoding-variant effects de novo from sequence, we developed a deep learning–based algorithmic framework, DeepSEA (http://deepsea.princeton.edu/), that directly learns a regulatory sequence code from large-scale chromatin-profiling data, enabling prediction of chromatin effects of sequence alterations with single-nucleotide sensitivity. We further used this capability to improve prioritization of functional variants including expression quantitative trait loci (eQTLs) and disease-associated variants.
Reduced lung function predicts mortality and is key to the diagnosis of chronic obstructive pulmonary disease (COPD). In a genome-wide association study in 400,102 individuals of European ancestry, we define 279 lung function signals, 139 of which are new. In combination, these variants strongly predict COPD in independent patient populations. Furthermore, the combined effect of these variants showed generalizability across smokers and never-smokers, and across ancestral groups. We highlight biological pathways, known and potential drug targets for COPD and, in phenome-wide association studies, autoimmune-related and other pleiotropic effects of lung function associated variants. This new genetic evidence has potential to improve future preventive and therapeutic strategies for COPD.
Key challenges for human genetics, precision medicine and evolutionary biology include deciphering the regulatory code of gene expression and understanding the transcriptional effects of genome variation. However, this is extremely difficult because of the enormous scale of the noncoding mutation space. We developed a deep learning-based framework, ExPecto, that can accurately predict, ab initio from a DNA sequence, the tissue-specific transcriptional effects of mutations, including those that are rare or that have not been observed. We prioritized causal variants within disease- or trait-associated loci from all publicly available genome-wide association studies and experimentally validated predictions for four immune-related diseases. By exploiting the scalability of ExPecto, we characterized the regulatory mutation space for human RNA polymerase II-transcribed genes by in silico saturation mutagenesis and profiled > 140 million promoter-proximal mutations. This enables probing of evolutionary constraints on gene expression and ab initio prediction of mutation disease effects, making ExPecto an end-to-end computational framework for the in silico prediction of expression and disease risk.
We address the challenge of detecting the contribution of noncoding mutations to disease with a deep-learning-based framework that predicts specific regulatory effects and the deleterious impact of genetic variants. Applying this framework to 1,790 Autism Spectrum Disorder (ASD) simplex families reveals disease causality of noncoding mutations: ASD probands harbor both transcriptional and post-transcriptional regulation-disrupting de novo mutations of significantly higher functional impact than unaffected siblings. Further analysis suggests involvement of noncoding mutations in synaptic transmission and neuronal development, and taken together with prior studies reveal a convergent genetic landscape of coding and noncoding mutations in ASD. We demonstrate that sequences carrying prioritized proband mutations possess allele-specific regulatory activity, and highlight a link between noncoding mutations and IQ heterogeneity in ASD probands. Our predictive genomics framework illuminates the role of noncoding mutations in ASD, prioritizes high impact mutations for further study, and is broadly applicable to complex human diseases.
In metastatic prostate cancer (PCa) cells, imbalance between cell survival and death signals such as constitutive activation of phosphatidylinositol 3-kinase (PI3K)-Akt and inactivation of apoptosisstimulated kinase (ASK1)-JNK pathways is often detected. Here, we show that DAB2IP protein, often down-regulated in PCa, is a potent growth inhibitor by inducing G 0/G1 cell cycle arrest and is proapoptotic in response to stress. Gain of function study showed that DAB2IP can suppress the PI3K-Akt pathway and enhance ASK1 activation leading to cell apoptosis, whereas loss of DAB2IP expression resulted in PI3K-Akt activation and ASK1-JNK inactivation leading to accelerated PCa growth in vivo. Moreover, glandular epithelia from DAB2IP ؊/؊ animal exhibited hyperplasia and apoptotic defect. Structural functional analyses of DAB2IP protein indicate that both proline-rich (PR) and PERIOD-like (PER) domains, in addition to the critical role of C2 domain in ASK1 activity, are important for modulating PI3K-Akt activity. Thus, DAB2IP is a scaffold protein capable of bridging both survival and death signal molecules, which implies its role in maintaining cell homeostasis.cell apoptosis ͉ prostate cancer ͉ signal transduction
The mammalian kidney develops through reciprocal interactions between the ureteric bud and the metanephric mesenchyme to give rise to the entire collecting system and the nephrons. Most of our knowledge of the developmental regulators driving this process arises from the study of gene expression and functional genetics in mice and other animal models. In order to shed light on human kidney development, we have used single-cell transcriptomics to characterize gene expression in different cell populations, and to study individual cell dynamics and lineage trajectories during development. Single-cell transcriptome analyses of 6414 cells from five individual specimens identified 11 initial clusters of specific renal cell types as defined by their gene expression profile. Further subclustering identifies progenitors, and mature and intermediate stages of differentiation for several renal lineages. Other lineages identified include mesangium, stroma, endothelial and immune cells. Novel markers for these cell types were revealed in the analysis, as were components of key signaling pathways driving renal development in animal models. Altogether, we provide a comprehensive and dynamic gene expression profile of the developing human kidney at the single-cell level.
A genetic etiology is identified for one third of congenital heart disease (CHD) patients, including 8% attributable to coding de novo variants (DNVs). To assess the contribution of noncoding DNVs to CHD, we compared genome sequences from 749 CHD probands and their parents with 1,611 unaffected trios. Neural network prediction of noncoding DNV transcriptional impact identified a burden of DNVs in CHD ( n = 2,238 DNVs) compared to controls ( n = 4,177; P = 8.7 × 10 −4 ). Independent analyses of enhancers showed excess DNVs in associated genes (27 genes vs. 3.7 expected, P = 1 × 10 −5 ). We observed significant overlap between these transcription-based approaches (OR = 2.5, 95% CI 1.1–5.0, P = 5.4 × 10 −3 ). CHD DNVs altered transcription levels in five of 31 enhancers assayed. Finally, we observed DNV burden in RNA-binding protein regulatory sites (OR = 1.13, 95% CI 1.1–1.2, P = 8.8 × 10 −5 ). Our findings demonstrate an enrichment of potentially disruptive regulatory noncoding DNVs in a fraction of CHD at least as high as observed for damaging coding DNVs.
Epigenomic profiling has enabled large-scale identification of regulatory elements, yet we still lack a systematic mapping from any sequence or variant to regulatory activities. We address this challenge with Sei, a framework for integrating human genetics data with sequence information to discover the regulatory basis of traits and diseases. Sei learns a vocabulary of regulatory activities, called sequence classes, using a deep learning model that predicts 21,907 chromatin profiles across >1,300 cell lines and tissues. Sequence classes provide a global classification and quantification of sequence and variant effects based on diverse regulatory activities, such as cell type-specific enhancer functions. These predictions are supported by tissue-specific expression, expression quantitative trait loci and evolutionary constraint data. Furthermore, sequence classes enable characterization of the tissue-specific, regulatory architecture of complex traits and generate mechanistic hypotheses for individual regulatory pathogenic mutations. We provide Sei as a resource to elucidate the regulatory basis of human health and disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.