Genome-wide association studies suggest that common genetic variants explain only a small fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability1,2. While DNA sequencing costs have fallen dramatically3, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions (`exomes'), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of twelve humans. These include eight HapMap individuals representing three populations4, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon syndrome (FSS)5. We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases (Mb) of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for monogenic disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of nonsynonymous variants by predicted functional impact.
The incorporation of genomics into medicine is stimulating interest on the return of incidental findings (IFs) from exome and genome sequencing. However, no large-scale study has yet estimated the number of expected actionable findings per individual; therefore, we classified actionable pathogenic single-nucleotide variants in 500 European- and 500 African-descent participants randomly selected from the National Heart, Lung, and Blood Institute Exome Sequencing Project. The 1,000 individuals were screened for variants in 114 genes selected by an expert panel for their association with medically actionable genetic conditions possibly undiagnosed in adults. Among the 1,000 participants, 585 instances of 239 unique variants were identified as disease causing in the Human Gene Mutation Database (HGMD). The primary literature supporting the variants' pathogenicity was reviewed. Of the identified IFs, only 16 unique autosomal-dominant variants in 17 individuals were assessed to be pathogenic or likely pathogenic, and one participant had two pathogenic variants for an autosomal-recessive disease. Furthermore, one pathogenic and four likely pathogenic variants not listed as disease causing in HGMD were identified. These data can provide an estimate of the frequency (∼3.4% for European descent and ∼1.2% for African descent) of the high-penetrance actionable pathogenic or likely pathogenic variants in adults. The 23 participants with pathogenic or likely pathogenic variants were disproportionately of European (17) versus African (6) descent. The process of classifying these variants underscores the need for a more comprehensive and diverse centralized resource to provide curated information on pathogenicity for clinical use to minimize health disparities in genomic medicine.
Recommendations for laboratories to report incidental findings from genomic tests have stimulated interest in such results. In order to investigate the criteria and processes for assigning the pathogenicity of specific variants and to estimate the frequency of such incidental findings in patients of European and African ancestry, we classified potentially actionable pathogenic single-nucleotide variants (SNVs) in all 4300 European- and 2203 African-ancestry participants sequenced by the NHLBI Exome Sequencing Project (ESP). We considered 112 gene-disease pairs selected by an expert panel as associated with medically actionable genetic disorders that may be undiagnosed in adults. The resulting classifications were compared to classifications from other clinical and research genetic testing laboratories, as well as with in silico pathogenicity scores. Among European-ancestry participants, 30 of 4300 (0.7%) had a pathogenic SNV and six (0.1%) had a disruptive variant that was expected to be pathogenic, whereas 52 (1.2%) had likely pathogenic SNVs. For African-ancestry participants, six of 2203 (0.3%) had a pathogenic SNV and six (0.3%) had an expected pathogenic disruptive variant, whereas 13 (0.6%) had likely pathogenic SNVs. Genomic Evolutionary Rate Profiling mammalian conservation score and the Combined Annotation Dependent Depletion summary score of conservation, substitution, regulation, and other evidence were compared across pathogenicity assignments and appear to have utility in variant classification. This work provides a refined estimate of the burden of adult onset, medically actionable incidental findings expected from exome sequencing, highlights challenges in variant classification, and demonstrates the need for a better curated variant interpretation knowledge base.
Freeman-Sheldon syndrome, or distal arthrogryposis type 2A (DA2A), is an autosomal-dominant condition caused by mutations in MYH3 and characterized by multiple congenital contractures of the face and limbs and normal cognitive development. We identified a subset of five individuals who had been putatively diagnosed with "DA2A with severe neurological abnormalities" and for whom congenital contractures of the limbs and face, hypotonia, and global developmental delay had resulted in early death in three cases; this is a unique condition that we now refer to as CLIFAHDD syndrome. Exome sequencing identified missense mutations in the sodium leak channel, non-selective (NALCN) in four families affected by CLIFAHDD syndrome. We used molecular-inversion probes to screen for NALCN in a cohort of 202 distal arthrogryposis (DA)-affected individuals as well as concurrent exome sequencing of six other DA-affected individuals, thus revealing NALCN mutations in ten additional families with "atypical" forms of DA. All 14 mutations were missense variants predicted to alter amino acid residues in or near the S5 and S6 pore-forming segments of NALCN, highlighting the functional importance of these segments. In vitro functional studies demonstrated that NALCN alterations nearly abolished the expression of wild-type NALCN, suggesting that alterations that cause CLIFAHDD syndrome have a dominant-negative effect. In contrast, homozygosity for mutations in other regions of NALCN has been reported in three families affected by an autosomal-recessive condition characterized mainly by hypotonia and severe intellectual disability. Accordingly, mutations in NALCN can cause either a recessive or dominant condition characterized by varied though overlapping phenotypic features, perhaps based on the type of mutation and affected protein domain(s).
The detection of sequence variation, for which DNA sequencing has emerged as the most sensitive and automated approach, forms the basis of all genetic analysis. Here we describe and illustrate an algorithm that accurately detects and genotypes SNPs from fluorescence-based sequence data. Because the algorithm focuses particularly on detecting SNPs through the identification of heterozygous individuals, it is especially well suited to the detection of SNPs in diploid samples obtained after DNA amplification. It is substantially more accurate than existing approaches and, notably, provides a useful quantitative measure of its confidence in each potential SNP detected and in each genotype called. Calls assigned the highest confidence are sufficiently reliable to remove the need for manual review in several contexts. For example, for sequence data from 47-90 individuals sequenced on both the forward and reverse strands, the highest-confidence calls from our algorithm detected 93% of all SNPs and 100% of high-frequency SNPs, with no false positive SNPs identified and 99.9% genotyping accuracy. This algorithm is implemented in a software package, PolyPhred version 5.0, which is freely available for academic use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.