Copy number variants (CNVs) are a major cause of several genetic disorders, making their detection an essential component of genetic analysis pipelines. Current methods for detecting CNVs from exome-sequencing data are limited by high false-positive rates and low concordance because of inherent biases of individual algorithms. To overcome these issues, calls generated by two or more algorithms are often intersected using Venn diagram approaches to identify “high-confidence” CNVs. However, this approach is inadequate, because it misses potentially true calls that do not have consensus from multiple callers. Here, we present CN-Learn, a machine-learning framework that integrates calls from multiple CNV detection algorithms and learns to accurately identify true CNVs using caller-specific and genomic features from a small subset of validated CNVs. Using CNVs predicted by four exome-based CNV callers (CANOES, CODEX, XHMM, and CLAMMS) from 503 samples, we demonstrate that CN-Learn identifies true CNVs at higher precision (∼90%) and recall (∼85%) rates while maintaining robust performance even when trained with minimal data (∼30 samples). CN-Learn recovers twice as many CNVs compared to individual callers or Venn diagram–based approaches, with features such as exome capture probe count, caller concordance, and GC content providing the most discriminatory power. In fact, ∼58% of all true CNVs recovered by CN-Learn were either singletons or calls that lacked support from at least one caller. Our study underscores the limitations of current approaches for CNV identification and provides an effective method that yields high-quality CNVs for application in clinical diagnostics.
We previously identified a deletion on chromosome 16p12.1 that is mostly inherited and associated with multiple neurodevelopmental outcomes, where severely affected probands carried an excess of rare pathogenic variants compared to mildly affected carrier parents. We hypothesized that the 16p12.1 deletion sensitizes the genome for disease, while “second-hits” in the genetic background modulate the phenotypic trajectory. To test this model, we examined how neurodevelopmental defects conferred by knockdown of individual 16p12.1 homologs are modulated by simultaneous knockdown of homologs of “second-hit” genes in Drosophila melanogaster and Xenopus laevis. We observed that knockdown of 16p12.1 homologs affect multiple phenotypic domains, leading to delayed developmental timing, seizure susceptibility, brain alterations, abnormal dendrite and axonal morphology, and cellular proliferation defects. Compared to genes within the 16p11.2 deletion, which has higher de novo occurrence, 16p12.1 homologs were less likely to interact with each other in Drosophila models or a human brain-specific interaction network, suggesting that interactions with “second-hit” genes may confer higher impact towards neurodevelopmental phenotypes. Assessment of 212 pairwise interactions in Drosophila between 16p12.1 homologs and 76 homologs of patient-specific “second-hit” genes (such as ARID1B and CACNA1A), genes within neurodevelopmental pathways (such as PTEN and UBE3A), and transcriptomic targets (such as DSCAM and TRRAP) identified genetic interactions in 63% of the tested pairs. In 11 out of 15 families, patient-specific “second-hits” enhanced or suppressed the phenotypic effects of one or many 16p12.1 homologs in 32/96 pairwise combinations tested. In fact, homologs of SETD5 synergistically interacted with homologs of MOSMO in both Drosophila and X. laevis, leading to modified cellular and brain phenotypes, as well as axon outgrowth defects that were not observed with knockdown of either individual homolog. Our results suggest that several 16p12.1 genes sensitize the genome towards neurodevelopmental defects, and complex interactions with “second-hit” genes determine the ultimate phenotypic manifestation.
Genetic studies of complex disorders such as autism and intellectual disability (ID) are often based on enrichment of individual rare variants or their aggregate burden in affected individuals compared to controls. However, these studies overlook the influence of combinations of rare variants that may not be deleterious on their own due to statistical challenges resulting from rarity and combinatorial explosion when enumerating variant combinations, limiting our ability to study oligogenic basis for these disorders. Here, we present RareComb, a framework that combines the apriori algorithm and statistical inference to identify specific combinations of mutated genes associated with complex phenotypes. RareComb overcomes computational barriers and exhaustively evaluates variant combinations to identify nonadditive relationships between simultaneously mutated genes. Using RareComb, we analyzed 6,189 individuals with autism and identified 718 combinations significantly associated with ID, and carriers of these combinations showed lower IQ than expected in an independent cohort of 1,878 individuals. These combinations were enriched for nervous system genes such as NIN and NGF, showed complex inheritance patterns, and were depleted in unaffected siblings. We found that an affected individual can carry many oligogenic combinations, each contributing to the same phenotype or distinct phenotypes at varying effect sizes. We also used this framework to identify combinations associated with multiple comorbid phenotypes, including mutations of COL28A1 and MFSD2B for ID and schizophrenia and ABCA4, DNAH10 and MC1R for ID and anxiety/depression. Our framework identifies a key component of missing heritability and provides a novel paradigm to untangle the genetic architecture of complex disorders.
The contribution of distinct genes to overlapping phenotypes suggests that such genes share ancestral origins, membership of disease pathways, or molecular functions. A recent study by Liu and colleagues identified mutations in TCF20, a paralog of RAI1 , among individuals manifesting a novel syndrome that has phenotypes similar to those of Smith-Magenis syndrome (a disorder caused by disruption of RAI1 ). This study highlights how structural similarity among genes contributes to shared phenotypes, and shows how this relationship can contribute to our understanding of the genetic basis of complex disorders.
We previously identified a deletion on chromosome 16p12.1 that is mostly inherited and associated with multiple neurodevelopmental outcomes, where severely affected probands carried an excess of rare pathogenic variants compared to mildly affected carrier parents. We hypothesized that the 16p12.1 deletion sensitizes the genome for disease, while “second hits” in the genetic background modulate the phenotypic trajectory. To test this model, we examined how neurodevelopmental defects conferred by knockdown of individual 16p12.1 homologs are modulated by simultaneous knockdown of homologs of “second hit” genes in Drosophila melanogaster and Xenopus laevis . We observed that knockdown of 16p12.1 homologs affect multiple phenotypic domains, leading to delayed developmental timing, seizure susceptibility, brain alterations, abnormal dendrite and axonal morphology, and cellular proliferation defects. In contrast to genes within the 16p11.2 deletion, which has higher de novo occurrence, 16p12.1 homologs additively interacted and were less connected to each other in a human brain-specific interaction network, suggesting that interactions with second-hit genes confer higher impact towards neurodevelopmental phenotypes. Assessment of 358 pairwise interactions in Drosophila between 16p12.1 homologs and 76 homologs of patient-specific “second-hit” genes (such as ARID1B and CACNA1A ), genes within neurodevelopmental pathways (such as PTEN and UBE3A ), and transcriptomic targets (such as DSCAM and TRRAP ) identified both additive (47%) and epistatic (53%) effects. In 11 out of 15 families, homologs of patient-specific “second-hits” showed distinct patterns of interactions, enhancing or suppressing the phenotypic effects of one or many 16p12.1 homologs. In fact, homologs of SETD5 synergistically interacted with homologs of MOSMO in both Drosophila and X. laevis , leading to modified cellular and brain phenotypes, as well as axon outgrowth defects that were not observed with knockdown of either individual homolog. Our results suggest that several 16p12.1 genes sensitize the genome towards neurodevelopmental defects, and complex interactions with “second-hit” genes determine the ultimate phenotypic manifestation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.