Some individuals with autism spectrum disorder (ASD) carry functional mutations rarely observed in the general population. We explored the genes disrupted by these variants from joint analysis of protein-truncating (PTV), missense, and copy number variants (CNVs) in a cohort of 63,237 individuals. We discovered 72 ASD risk genes at false discovery rate (FDR)≤0.001 (185 at FDR≤0.05). De novo PTVs, damaging missense variants, and CNVs represented 57.5%, 21.1%, and 8.44% of association evidence, while CNVs conferred greatest relative risk. Meta-analysis with cohorts ascertained for developmental delay (DD, N=91,605) yielded 373 ASD/DD risk genes at FDR≤0.001 (664 at FDR≤0.05), some of which differed in relative frequency of mutation between ASD and DD. The DD-associated genes were enriched in transcriptomes of progenitor and immature neuronal cells whereas genes displaying stronger evidence in ASD were more enriched in maturing neurons and overlapped with schizophreniaassociated genes, emphasizing that these neuropsychiatric disorders share common pathways to risk.
Individuals with autism spectrum disorder (ASD) or related neurodevelopmental disorders (NDDs) often carry disruptive mutations in genes that are depleted of functional variation in the broader population. We build upon this observation and exome sequencing from 154,842 individuals to explore the allelic diversity of rare protein-coding variation contributing risk for ASD and related NDDs. Using an integrative statistical model, we jointly analyzed rare protein-truncating variants (PTVs), damaging missense variants, and copy number variants (CNVs) derived from exome sequencing of 63,237 individuals from ASD cohorts. We discovered 71 genes associated with ASD at a false discovery rate (FDR) ≤ 0.001, a threshold approximately equivalent to exome-wide significance, and 183 genes at FDR ≤ 0.05. Associations were predominantly driven by de novo PTVs, damaging missense variants, and CNVs: 57.4%, 21.2%, and 8.32% of evidence, respectively. Though fewer in number, CNVs conferred greater relative risk than PTVs, and repeat-mediated de novo CNVs exhibited strong maternal bias in parent-of-origin (e.g., 92.3% of 16p11.2 CNVs), whereas all other CNVs showed a paternal bias. To explore how genes associated with ASD and NDD overlap or differ, we analyzed our ASD cohort alongside a developmental delay (DD) cohort from the deciphering developmental disorders study (DDD; n=91,605 samples). We first reanalyzed the DDD dataset using the same models as the ASD cohorts, then performed joint analyses of both cohorts and identified 373 genes contributing to NDD risk at FDR ≤ 0.001 and 662 NDD risk genes at FDR ≤ 0.05. Of these NDD risk genes, 54 genes (125 genes at FDR ≤ 0.05) were unique to the joint analyses and not significant in either cohort alone. Our results confirm overlap of most ASD and DD risk genes, although many differ significantly in frequency of mutation. Analyses of single-cell transcriptome datasets showed that genes associated predominantly with DD were strongly enriched for earlier neurodevelopmental cell types, whereas genes displaying stronger evidence for association in ASD cohorts were more enriched for maturing neurons. The ASD risk genes were also enriched for genes associated with schizophrenia from a separate rare coding variant analysis of 121,570 individuals, emphasizing that these neuropsychiatric disorders share common pathways to risk.
Copy number variants (CNVs) are major contributors to genetic diversity and disease. To date, exome sequencing (ES) has been generated for millions of individuals in international biobanks, human disease studies, and clinical diagnostic screening. While standardized methods exist for detecting short variants (single nucleotide and insertion/deletion variants) using tools such as the Genome Analysis ToolKit (GATK), technical challenges have confounded similarly uniform large-scale CNV analyses from ES data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, the lack of widely-adopted and robustly benchmarked rare CNV discovery tools has presented a barrier to routine exome-wide assessment of this critical class of variation. Here, we introduce GATK-gCNV, a flexible algorithm to discover rare CNVs from genome sequencing read-depth information, which we distribute as an open-source tool packaged in GATK. GATK-gCNV uses a probabilistic model and inference framework that accounts for technical biases while simultaneously predicting CNVs, which enables self-consistency between technical read-depth normalization and variant calling. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data. These analyses demonstrated 97% recall of rare (≤1% site frequency) coding CNVs detected by microarrays and 95% recall of rare coding CNVs discovered by genome sequencing at a resolution of more than two exons. We applied GATK-gCNV to generate a reference catalog of rare coding CNVs in 197,306 individuals with ES from the UK Biobank. We observed strong correlations between CNV rates per gene and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in ES, which can easily be applied across trait association and clinical screening.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.