As the sequencing of healthy and disease genomes becomes more commonplace, detailed annotation provides interpretation for individual variation responsible for normal and disease phenotypes. Current approaches focus on direct changes in protein coding genes, particularly nonsynonymous mutations that directly affect the gene product. However, most individual variation occurs outside of genes and, indeed, most markers generated from genome-wide association studies (GWAS) identify variants outside of coding segments. Identification of potential regulatory changes that perturb these sites will lead to a better localization of truly functional variants and interpretation of their effects. We have developed a novel approach and database, RegulomeDB, which guides interpretation of regulatory variants in the human genome. RegulomeDB includes high-throughput, experimental data sets from ENCODE and other sources, as well as computational predictions and manual annotations to identify putative regulatory potential and identify functional variants. These data sources are combined into a powerful tool that scores variants to help separate functional variants from a large pool and provides a small set of putative sites with testable hypotheses as to their function. We demonstrate the applicability of this tool to the annotation of noncoding variants from 69 full sequenced genomes as well as that of a personal genome, where thousands of functionally associated variants were identified. Moreover, we demonstrate a GWAS where the database is able to quickly identify the known associated functional variant and provide a hypothesis as to its function. Overall, we expect this approach and resource to be valuable for the annotation of human genome sequences. [Supplemental material is available for this article.]The increasing number of sequenced human genomes is providing a catalog of the large number of individual variations present in the human genome (The International HapMap Consortium 2005; The 1000 Genomes Project Consortium 2010). Many of these variants are expected to be responsible for normal and disease phenotypes. Similarly, large, genome-wide association studies (GWAS) continue to map diseases to associated genomic regions from large cohorts of individuals (Hindorff et al. 2012). Initial interpretation of results generated by both of these approaches has been limited to DNA regions that cause disruption of gene function through coding sequence changes typically identified using an application such as PolyPhen-2 (Adzhubei et al. 2010). However, ;95% of known variants within sequenced genomes and 88% of those variants from GWAS studies fall outside of coding regions and have been difficult to interpret ).Both large consortia and individual labs are generating a significant amount of regulatory information that is providing a better interpretation of the noncoding portions of the genome. The ENCODE Project, in particular, has mapped open chromatin and protein binding regions for large numbers of factors across many cell type...
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify ''functional SNPs'' that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
With multiple genome-wide association studies (GWAS) performed across autoimmune diseases, there is a great opportunity to study the homogeneity of genetic architectures across autoimmune disease. Previous approaches have been limited in the scope of their analysis and have failed to properly incorporate the direction of allele-specific disease associations for SNPs. In this work, we refine the notion of a genetic variation profile for a given disease to capture strength of association with multiple SNPs in an allele-specific fashion. We apply this method to compare genetic variation profiles of six autoimmune diseases: multiple sclerosis (MS), ankylosing spondylitis (AS), autoimmune thyroid disease (ATD), rheumatoid arthritis (RA), Crohn's disease (CD), and type 1 diabetes (T1D), as well as five non-autoimmune diseases. We quantify pair-wise relationships between these diseases and find two broad clusters of autoimmune disease where SNPs that make an individual susceptible to one class of autoimmune disease also protect from diseases in the other autoimmune class. We find that RA and AS form one such class, and MS and ATD another. We identify specific SNPs and genes with opposite risk profiles for these two classes. We furthermore explore individual SNPs that play an important role in defining similarities and differences between disease pairs. We present a novel, systematic, cross-platform approach to identify allele-specific relationships between disease pairs based on genetic variation as well as the individual SNPs which drive the relationships. While recognizing similarities between diseases might lead to identifying novel treatment options, detecting differences between diseases previously thought to be similar may point to key novel disease-specific genes and pathways.
Many genes with important roles in development and disease contain exceptionally long introns, but special mechanisms for their expression have not been investigated. We present bioinformatic, phylogenetic, and experimental evidence in Drosophila for a mechanism that subdivides many large introns by recursive splicing at nonexonic elements and alternative exons. Recursive splice sites predicted with highly stringent criteria are found at much higher frequency than expected in the sense strands of introns Ͼ20 kb, but they are found only at the expected frequency on the antisense strands, and they are underrepresented within introns Ͻ10 kb. The predicted sites in long introns are highly conserved between Drosophila melanogaster and Drosophila pseudoobscura, despite extensive divergence of other sequences within the same introns. These patterns of enrichment and conservation indicate that recursive splice sites are advantageous in the context of long introns. Experimental analyses of in vivo processing intermediates and lariat products from four large introns in the unrelated genes kuzbanian, outspread, and Ultrabithorax confirmed that these introns are removed by a series of recursive splicing steps using the predicted nonexonic sites. Mutation of nonexonic site RP3 within Ultrabithorax also confirmed that recursive splicing is the predominant processing pathway even with a shortened version of the intron. We discuss currently known and potential roles for recursive splicing.
Men and women differ in susceptibility to many diseases and in responses to treatment. Recent advances in genome-wide association studies (GWAS) provide a wealth of data for associating genetic profiles with disease risk; however, in general, these data have not been systematically probed for sex differences in gene-disease associations. Incorporating sex into the analysis of GWAS results can elucidate new relationships between single nucleotide polymorphisms (SNPs) and human disease. In this study, we performed a sex-differentiated analysis on significant SNPs from GWAS data of the seven common diseases studied by the Wellcome Trust Case Control Consortium. We employed and compared three methods: logistic regression, Woolf’s test of heterogeneity, and a novel statistical metric that we developed called permutation method to assess sex effects (PMASE). After correction for false discovery, PMASE finds SNPs that are significantly associated with disease in only one sex. These sexually dimorphic SNP-disease associations occur in Coronary Artery Disease and Crohn’s Disease. GWAS analyses that fail to consider sex-specific effects may miss discovering sexual dimorphism in SNP-disease associations that give new insights into differences in disease mechanism between men and women.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.