Genome-wide association studies (GWAS) often identify disease-associated mutations in intergenic and non-coding regions of the genome. Given the high percentage of the human genome that is transcribed, we postulate that for some observed associations the disease phenotype is caused by a structural rearrangement in a regulatory region of the RNA transcript. To identify such mutations, we have performed a genome-wide analysis of all known disease-associated Single Nucleotide Polymorphisms (SNPs) from the Human Gene Mutation Database (HGMD) that map to the untranslated regions (UTRs) of a gene. Rather than using minimum free energy approaches (e.g. mFold), we use a partition function calculation that takes into consideration the ensemble of possible RNA conformations for a given sequence. We identified in the human genome disease-associated SNPs that significantly alter the global conformation of the UTR to which they map. For six disease-states (Hyperferritinemia Cataract Syndrome, β-Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease (COPD), and Hypertension), we identified multiple SNPs in UTRs that alter the mRNA structural ensemble of the associated genes. Using a Boltzmann sampling procedure for sub-optimal RNA structures, we are able to characterize and visualize the nature of the conformational changes induced by the disease-associated mutations in the structural ensemble. We observe in several cases (specifically the 5′ UTRs of FTL and RB1) SNP–induced conformational changes analogous to those observed in bacterial regulatory Riboswitches when specific ligands bind. We propose that the UTR and SNP combinations we identify constitute a “RiboSNitch,” that is a regulatory RNA in which a specific SNP has a structural consequence that results in a disease phenotype. Our SNPfold algorithm can help identify RiboSNitches by leveraging GWAS data and an analysis of the mRNA structural ensemble.
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene’s proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene’s regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen’s Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, ncCADD and ncGWAVA, and find both scores are significantly predictive of human dosage sensitive genes and appear to carry information beyond conservation, as assessed by ncGERP. These results highlight that the intolerance of noncoding sequence stretches in the human genome can provide a critical complementary tool to other genome annotation approaches to help identify the parts of the human genome increasingly likely to harbor mutations that influence risk of disease.
Identifying the underlying causes of disease requires accurate interpretation of genetic variants. Current methods ineffectively capture pathogenic non-coding variants in genic regions, resulting in overlooking synonymous and intronic variants when searching for disease risk. Here we present the Transcript-inferred Pathogenicity (TraP) score, which uses sequence context alterations to reliably identify non-coding variation that causes disease. High TraP scores single out extremely rare variants with lower minor allele frequencies than missense variants. TraP accurately distinguishes known pathogenic and benign variants in synonymous (AUC = 0.88) and intronic (AUC = 0.83) public datasets, dismissing benign variants with exceptionally high specificity. TraP analysis of 843 exomes from epilepsy family trios identifies synonymous variants in known epilepsy genes, thus pinpointing risk factors of disease from non-coding sequence data. TraP outperforms leading methods in identifying non-coding variants that are pathogenic and is therefore a valuable tool for use in gene discovery and the interpretation of personal genomes.
Folding to a well-defined conformation is essential for the function of structured ribonucleic acids (RNAs) like the ribosome and tRNA. Structured elements in the untranslated regions (UTRs) of specific messenger RNAs (mRNAs) are known to control expression. The importance of unstructured regions adopting multiple conformations, however, is still poorly understood. High-resolution SHAPE-directed Boltzmann suboptimal sampling of the Homo sapiens Retinoblastoma 1 (RB1) 5 ′ UTR yields three distinct conformations compatible with the experimental data. Private single nucleotide variants (SNVs) identified in two patients with retinoblastoma each collapse the structural ensemble to a single but distinct well-defined conformation. The RB1 5 ′ UTRs from Bos taurus (cow) and Trichechus manatus latirostris (manatee) are divergent in sequence from H. sapiens (human) yet maintain structural compatibility with high-probability base pairs. SHAPE chemical probing of the cow and manatee RB1 5 ′ UTRs reveals that they also adopt multiple conformations. Luciferase reporter assays reveal that 5 ′ UTR mutations alter RB1 expression. In a traditional model of disease, causative SNVs disrupt a key structural element in the RNA. For the subset of patients with heritable retinoblastoma-associated SNVs in the RB1 5 ′ UTR, the absence of multiple structures is likely causative of the cancer. Our data therefore suggest that selective pressure will favor multiple conformations in eukaryotic UTRs to regulate expression.
Purpose An emerging approach in medical genetics is to identify de novo mutations in patients with severe early-onset genetic disease that are absent in population controls and in the patient’s parents. This approach, however, frequently misses post-zygotic “mosaic” mutations that are present in only a portion of the healthy parents’ cells and are transmitted to offspring. Methods We constructed a mosaic transmission screen for variants that have an ~50% alternative allele ratio in the proband but are significantly less than 50% in the transmitting parent. We applied it to two family-based genetic disease cohorts consisting of 9 cases of sudden unexplained death in childhood (SUDC) and 338 previously published cases of epileptic encephalopathy. Results The screen identified six parental-mosaic transmissions across the two cohorts. The resultant rate of ~0.02 identified transmissions per trio is far lower than that of de novo mutations. Among these transmissions were two likely disease-causing mutations: an SCN1A mutation transmitted to an SUDC proband and her sibling with Dravet syndrome, as well as an SLC6A1 mutation in a proband with epileptic encephalopathy. Conclusion These results highlight explicit screening for mosaic mutations as an important complement to the established approach of screening for de novo mutations.
A majority of SNPs (single nucleotide polymorphisms) map to noncoding and intergenic regions of the genome. Noncoding SNPs are often identified in genome-wide association studies (GWAS) as strongly associated with human disease. Two such diseaseassociated SNPs in the 59 UTR of the human FTL (Ferritin Light Chain) gene are predicted to alter the ensemble of structures adopted by the mRNA. High-accuracy single nucleotide resolution chemical mapping reveals that these SNPs result in substantial changes in the structural ensemble in agreement with the computational prediction. Furthermore six rescue mutations are correctly predicted to restore the mRNA to its wild-type ensemble. Our data confirm that the FTL 59 UTR is a ''RiboSNitch,'' an RNA that changes structure if a particular disease-associated SNP is present. The structural change observed is analogous to that of a bacterial Riboswitch in that it likely regulates translation. These data further suggest that specific pairs of SNPs in high linkage disequilibrium (LD) will form RNA structure-stabilizing haplotypes (SSHs). We identified 484 SNP pairs that form SSHs in UTRs of the human genome, and in eight of the 10 SSH-containing transcripts, SNP pairs stabilize RNA protein binding sites. The ubiquitous nature of SSHs in the transcriptome suggests that certain haplotypes are conserved to avoid RiboSNitch formation.
Anorexia nervosa (AN) and obsessive-compulsive disorder (OCD) are often comorbid and likely to share genetic risk factors. Hence, we examine their shared genetic background using a cross-disorder GWAS meta-analysis of 3495 AN cases, 2688 OCD cases, and 18,013 controls. We confirmed a high genetic correlation between AN and OCD (r = 0.49 ± 0.13, p = 9.07 × 10) and a sizable SNP heritability (SNP h = 0.21 ± 0.02) for the cross-disorder phenotype. Although no individual loci reached genome-wide significance, the cross-disorder phenotype showed strong positive genetic correlations with other psychiatric phenotypes (e.g., r = 0.36 with bipolar disorder and 0.34 with neuroticism) and negative genetic correlations with metabolic phenotypes (e.g., r = -0.25 with body mass index and -0.20 with triglycerides). Follow-up analyses revealed that although AN and OCD overlap heavily in their shared risk with other psychiatric phenotypes, the relationship with metabolic and anthropometric traits is markedly stronger for AN than for OCD. We further tested whether shared genetic risk for AN/OCD was associated with particular tissue or cell-type gene expression patterns and found that the basal ganglia and medium spiny neurons were most enriched for AN-OCD risk, consistent with neurobiological findings for both disorders. Our results confirm and extend genetic epidemiological findings of shared risk between AN and OCD and suggest that larger GWASs are warranted.
RNA conformation plays a significant role in stability, ligand binding, transcription and translation. Single nucleotide variants (SNVs) have the potential to disrupt specific structural elements because RNA folds in a sequence specific manner. A riboSNitch is an element of RNA structure with a specific function that is disrupted by an SNV or SNP (single nucleotide variant or polymorphism; SNVs occur with low frequency in the population, <1%). The riboSNitch is analogous to a riboswitch, where binding of a small molecule rather than mutation alters the structure of the RNA to control gene regulation. RiboSNitches are particularly relevant to interpreting the results of genome-wide association studies (GWAS). Often GWAS identify SNPs associated with a phenotype mapping to non-coding regions of the genome. Since a majority of the human genome is transcribed, significant subsets of GWAS SNPs are putative riboSNitches. The extent to which the transcriptome is tolerant of SNP-induced structure change is still poorly understood. Recent advances in ultra-high throughput structure probing begin to reveal the structural complexities of mutation induced structure change. This review summarizes our current understanding of SNV and SNP-induced structure change in the human transcriptome and discusses the importance of riboSNitch discovery in interpreting GWAS results and massive sequencing projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.