SUMMARY Exome sequencing of 343 families, each with a single child on the autism spectrum and at least one unaffected sibling, reveal de novo small indels and point substitutions, which come mostly from the paternal line in an age-dependent manner. We do not see significantly greater numbers of de novo missense mutations in affected versus unaffected children, but gene-disrupting mutations (nonsense, splice site, and frame shifts) are twice as frequent, 59 to 28. Based on this differential and the number of recurrent and total targets of gene disruption found in our and similar studies, we estimate between 350 and 400 autism susceptibility genes. Many of the disrupted genes in these studies are associated with the fragile X protein, FMRP, reinforcing links between autism and synaptic plasticity. We find FMRP-associated genes are under greater purifying selection than the remainder of genes and suggest they are especially dosage-sensitive targets of cognitive disorders.
Pancreatic cancer is projected to become the second leading cause of cancer-related death in the United States by 2020. A familial aggregation of pancreatic cancer has been established, but the cause of this aggregation in most families is unknown. To determine the genetic basis of susceptibility in these families, we sequenced the germline genome of 638 familial pancreatic cancer patients. We also sequenced the exomes of 39 familial pancreatic adenocarcinomas. Our analyses support the role of previously identified familial pancreatic cancer susceptibility genes such as BRCA2, CDKN2A and ATM, and identify novel candidate genes harboring rare, deleterious germline variants for further characterization. We also show how somatic point mutations that occur during hematopoiesis can affect the interpretation of genome-wide studies of hereditary traits. Our observations have important implications for the etiology of pancreatic cancer and for the identification of susceptibility genes in other common cancer types.
IMPORTANCE Complex disorders, such as bipolar disorder (BD), likely result from the influence of both common and rare susceptibility alleles. While common variation has been widely studied, rare variant discovery has only recently become feasible with next-generation sequencing. OBJECTIVE To utilize a combined family-based and case-control approach to exome sequencing in BD using multiplex families as an initial discovery strategy, followed by association testing in a large case-control meta-analysis. DESIGN, SETTING, AND PARTICIPANTS We performed exome sequencing of 36 affected members with BD from 8 multiplex families and tested rare, segregating variants in 3 independent case-control samples consisting of 3541 BD cases and 4774 controls. MAIN OUTCOMES AND MEASURES We used penalized logistic regression and 1-sided gene-burden analyses to test for association of rare, segregating damaging variants with BD. Permutation-based analyses were performed to test for overall enrichment with previously identified gene sets. RESULTS We found 84 rare (frequency <1%), segregating variants that were bioinformatically predicted to be damaging. These variants were found in 82 genes that were enriched for gene sets previously identified in de novo studies of autism (19 observed vs. 10.9 expected, P = .0066) and schizophrenia (11 observed vs. 5.1 expected, P = .0062) and for targets of the fragile X mental retardation protein (FMRP) pathway (10 observed vs. 4.4 expected, P = .0076). The case-control meta-analyses yielded 19 genes that were nominally associated with BD based either on individual variants or a gene-burden approach. Although no gene was individually significant after correction for multiple testing, this group of genes continued to show evidence for significant enrichment of de novo autism genes (6 observed vs 2.6 expected, P = .028). CONCLUSIONS AND RELEVANCE Our results are consistent with the presence of prominent locus and allelic heterogeneity in BD and suggest that very large samples will be required to definitively identify individual rare variants or genes conferring risk for this disorder. However, we also identify significant associations with gene sets composed of previously discovered de novo variants in autism and schizophrenia, as well as targets of the FRMP pathway, providing preliminary support for the overlap of potential autism and schizophrenia risk genes with rare, segregating variants in families with BD.
BackgroundThe processing and analysis of the large scale data generated by next-generation sequencing (NGS) experiments is challenging and is a burgeoning area of new methods development. Several new bioinformatics tools have been developed for calling sequence variants from NGS data. Here, we validate the variant calling of these tools and compare their relative accuracy to determine which data processing pipeline is optimal.ResultsWe developed a unified pipeline for processing NGS data that encompasses four modules: mapping, filtering, realignment and recalibration, and variant calling. We processed 130 subjects from an ongoing whole exome sequencing study through this pipeline. To evaluate the accuracy of each module, we conducted a series of comparisons between the single nucleotide variant (SNV) calls from the NGS data and either gold-standard Sanger sequencing on a total of 700 variants or array genotyping data on a total of 9,935 single-nucleotide polymorphisms. A head to head comparison showed that Genome Analysis Toolkit (GATK) provided more accurate calls than SAMtools (positive predictive value of 92.55% vs. 80.35%, respectively). Realignment of mapped reads and recalibration of base quality scores before SNV calling proved to be crucial to accurate variant calling. GATK HaplotypeCaller algorithm for variant calling outperformed the UnifiedGenotype algorithm. We also showed a relationship between mapping quality, read depth and allele balance, and SNV call accuracy. However, if best practices are used in data processing, then additional filtering based on these metrics provides little gains and accuracies of >99% are achievable.ConclusionsOur findings will help to determine the best approach for processing NGS data to confidently call variants for downstream analyses. To enable others to implement and replicate our results, all of our codes are freely available at http://metamoodics.org/wes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.