The repetitive nature and complexity of some medically relevant genes poses a challenge for their accurate analysis in a clinical setting. The Genome in a Bottle Consortium has provided variant benchmark sets, but these exclude nearly four hundred medically relevant genes due to their repetitiveness or polymorphic complexity. Here we characterize 273 of these 395 challenging autosomal genes using a haplotype-resolved whole-genome assembly. This curated benchmark reports over 17,000 single nucleotide variations, 3,600 INDELs, and 200 structural variations each for human genome reference GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically relevant genes including CBS , CRYAA , and KCNE1 . When masking these false duplications, variant recall can improve from 8% to 100%. Forming benchmarks from a haplotype-resolved whole-genome assembly may become a prototype for future benchmarks covering the whole genome.
Background The human genome contains “dark” gene regions that cannot be adequately assembled or aligned using standard short-read sequencing technologies, preventing researchers from identifying mutations within these gene regions that may be relevant to human disease. Here, we identify regions with few mappable reads that we call dark by depth, and others that have ambiguous alignment, called camouflaged. We assess how well long-read or linked-read technologies resolve these regions. Results Based on standard whole-genome Illumina sequencing data, we identify 36,794 dark regions in 6054 gene bodies from pathways important to human health, development, and reproduction. Of these gene bodies, 8.7% are completely dark and 35.2% are ≥ 5% dark. We identify dark regions that are present in protein-coding exons across 748 genes. Linked-read or long-read sequencing technologies from 10x Genomics, PacBio, and Oxford Nanopore Technologies reduce dark protein-coding regions to approximately 50.5%, 35.6%, and 9.6%, respectively. We present an algorithm to resolve most camouflaged regions and apply it to the Alzheimer’s Disease Sequencing Project. We rescue a rare ten-nucleotide frameshift deletion in CR1, a top Alzheimer’s disease gene, found in disease cases but not in controls. Conclusions While we could not formally assess the association of the CR1 frameshift mutation with Alzheimer’s disease due to insufficient sample-size, we believe it merits investigating in a larger cohort. There remain thousands of potentially important genomic regions overlooked by short-read sequencing that are largely resolved by long-read technologies. Electronic supplementary material The online version of this article (10.1186/s13059-019-1707-2) contains supplementary material, which is available to authorized users.
BackgroundWhile age and the APOE ε4 allele are major risk factors for Alzheimer’s disease (AD), a small percentage of individuals with these risk factors exhibit AD resilience by living well beyond 75 years of age without any clinical symptoms of cognitive decline.MethodsWe used over 200 “AD resilient” individuals and an innovative, pedigree-based approach to identify genetic variants that segregate with AD resilience. First, we performed linkage analyses in pedigrees with resilient individuals and a statistical excess of AD deaths. Second, we used whole genome sequences to identify candidate SNPs in significant linkage regions. Third, we replicated SNPs from the linkage peaks that reduced risk for AD in an independent dataset and in a gene-based test. Finally, we experimentally characterized replicated SNPs.ResultsRs142787485 in RAB10 confers significant protection against AD (p value = 0.0184, odds ratio = 0.5853). Moreover, we replicated this association in an independent series of unrelated individuals (p value = 0.028, odds ratio = 0.69) and used a gene-based test to confirm a role for RAB10 variants in modifying AD risk (p value = 0.002). Experimentally, we demonstrated that knockdown of RAB10 resulted in a significant decrease in Aβ42 (p value = 0.0003) and in the Aβ42/Aβ40 ratio (p value = 0.0001) in neuroblastoma cells. We also found that RAB10 expression is significantly elevated in human AD brains (p value = 0.04).ConclusionsOur results suggest that RAB10 could be a promising therapeutic target for AD prevention. In addition, our gene discovery approach can be expanded and adapted to other phenotypes, thus serving as a model for future efforts to identify rare variants for AD and other complex human diseases.Electronic supplementary materialThe online version of this article (doi:10.1186/s13073-017-0486-1) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.