Most variants implicated in common human disease by Genome-Wide Association Studies (GWAS) lie in non-coding sequence intervals. Despite the suggestion that regulatory element disruption represents a common theme, identifying causal risk variants within indicted genomic regions remains a significant challenge. Here we present a novel sequence-based computational method to predict the effect of regulatory variation, using a classifier (gkm-SVM) which encodes cell-specific regulatory sequence vocabularies. The induced change in the gkm-SVM score, deltaSVM, quantifies the effect of variants. We show that deltaSVM accurately predicts the impact of SNPs on DNase I sensitivity in their native genomic context, and accurately predicts the results of dense mutagenesis of several enhancers in reporter assays. Previously validated GWAS SNPs yield large deltaSVM scores, and we predict novel risk SNPs for several autoimmune diseases. Thus, deltaSVM provides a powerful computational approach for systematically identifying functional regulatory variants.
The identification of common variants that contribute to the genesis of human inherited disorders remains a significant challenge. Hirschsprung disease (HSCR) is a multifactorial, non-mendelian disorder in which rare high-penetrance coding sequence mutations in the receptor tyrosine kinase RET contribute to risk in combination with mutations at other genes. We have used family-based association studies to identify a disease interval, and integrated this with comparative and functional genomic analysis to prioritize conserved and functional elements within which mutations can be sought. We now show that a common non-coding RET variant within a conserved enhancer-like sequence in intron 1 is significantly associated with HSCR susceptibility and makes a 20-fold greater contribution to risk than rare alleles do. This mutation reduces in vitro enhancer activity markedly, has low penetrance, has different genetic effects in males and females, and explains several features of the complex inheritance pattern of HSCR. Thus, common low-penetrance variants, identified by association studies, can underlie both common and rare diseases.
Evolutionary sequence conservation is an accepted criterion to identify noncoding regulatory sequences. We have used a transposon-based transgenic assay in zebrafish to evaluate noncoding sequences at the zebrafish ret locus, conserved among teleosts, and at the human RET locus, conserved among mammals. Most teleost sequences directed ret-specific reporter gene expression, with many displaying overlapping regulatory control. The majority of human RET noncoding sequences also directed ret-specific expression in zebrafish. Thus, vast amounts of functional sequence information may exist that would not be detected by sequence similarity approaches.
We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.
The major gene for Hirschsprung disease (HSCR) encodes the receptor tyrosine kinase RET. In a study of 690 European- and 192 Chinese-descent probands and their parents or controls, we demonstrate the ubiquity of a >4-fold susceptibility from a C-->T allele (rs2435357: p = 3.9 x 10(-43) in European ancestry; p = 1.1 x 10(-21) in Chinese samples) that probably arose once within the intronic RET enhancer MCS+9.7. With in vitro assays, we now show that the T variant disrupts a SOX10 binding site within MCS+9.7 that compromises RET transactivation. The T allele, with a control frequency of 20%-30%/47% and case frequency of 54%-62%/88% in European/Chinese-ancestry individuals, is involved in all forms of HSCR. It is marginally associated with proband gender (p = 0.13) and significantly so with length of aganglionosis (p = 7.6 x 10(-5)) and familiality (p = 6.2 x 10(-4)). The enhancer variant is more frequent in the common forms of male, short-segment, and simplex families whereas multiple, rare, coding mutations are the norm in the less common and more severe forms of female, long-segment, and multiplex families. The T variant also increases penetrance in patients with rare RET coding mutations. Thus, both rare and common mutations, individually and together, make contributions to the risk of HSCR. The distribution of RET variants in diverse HSCR patients suggests a "cellular-recessive" genetic model where both RET alleles' function is compromised. The RET allelic series, and its genotype-phenotype correlations, shows that success in variant identification in complex disorders may strongly depend on which patients are studied.
Evaluating the biological relevance of the myriad putative regulatory noncoding sequences in vertebrate genomes represents a huge challenge. Functional analyses in vivo have typically relied on costly and labor-intensive transgenic strategies in mice. Transgenesis has also been applied in nonrodent vertebrates, such as zebrafish, but until recently these efforts have been hampered by significant mosaicism and poor rates of germline transmission. We have developed a transgenic strategy in zebrafish based on the Tol2 transposon, a mobile element that was recently identified in another teleost, Medaka. This method takes advantage of the increased efficiency of genome integration that is afforded by this intact DNA transposon, activity that is mediated by the corresponding transposase protein. The approach described in this protocol uses a universal vector system that permits rapid incorporation of DNA that is tagged with sequence targets for site-specific recombination. To evaluate the regulatory potential of a candidate sequence, the desired interval is PCR-amplified using sequence-specific primers that are flanked by the requisite target sites for cloning, and recombined into a universal expression plasmid (pGW_cfosEGFP). Purified recombinant DNAs are then injected into 1-2-cell zebrafish embryos and the resulting reporter expression patterns are analyzed at desired timepoints during development. This system is amenable to large-scale application, facilitating rapid functional analysis of noncoding sequences from both mammalian and teleost species.
Increased transforming growth factor beta (TGF-β) signaling has been implicated in the pathogenesis of syndromic presentations of aortic aneurysm, including Marfan syndrome (MFS) and Loeys-Dietz syndrome (LDS)1-4. However, the location and character of many of the causal mutations in LDS would intuitively infer diminished TGF-β signaling5. Taken together, these data have engendered controversy regarding the specific role of TGF-β in disease pathogenesis. Shprintzen-Goldberg syndrome (SGS) has considerable phenotypic overlap with MFS and LDS, including aortic aneurysm6-8. We identified causative variation in 10 patients with SGS in the proto-oncogene SKI, a known repressor of TGF-β activity9,10. Cultured patient dermal fibroblasts showed enhanced activation of TGF-β signaling cascades and increased expression of TGF-β responsive genes. Morpholino-induced silencing of SKI paralogs in zebrafish recapitulated abnormalities seen in SGS patients. These data support the conclusion that increased TGF-β signaling is the mechanism underlying SGS and contributes to multiple syndromic presentations of aortic aneurysm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.