Purpose: Identifying pathogenic non-coding variants in individuals with developmental disorders (DD) is challenging due to the large search space. It is common to find a single protein-altering variant in a recessive gene in DD patients, but the prevalence of pathogenic non-coding second hits in trans with these is unknown. Methods: In 4,073 genetically undiagnosed rare disease trio probands from the 100,000 Genomes project, we identified rare heterozygous loss-of-function (LoF) or ClinVar pathogenic variants in recessive DD-associated genes. Using stringent region-specific filtering, we identified rare non-coding variants on the other haplotype. Identified genes were clinically evaluated for phenotypic fit, and where possible, we performed functional testing using RNA-sequencing. Results: We found 2,430 probands with one or more rare heterozygous pLoF or ClinVar pathogenic variants in recessive DD-associated genes, for a total of 3,761 proband-variant pairs. For 1,366 (36.3%) of these pairs, we identified at least one rare non-coding variant in trans. After stringent bioinformatic filtering and clinical review, five were determined to be a good clinical fit (in ALMS1, NPHP3, LAMA2, IGHMBP2 and GAA). Conclusion: We developed a pipeline to systematically identify and annotate compound heterozygous coding/non-coding genotypes. Using this approach we uncovered new diagnoses and conclude that this mechanism is a rare cause of DDs.
The ability to predict disease association in human genes is enhanced by an evolutionary understanding. Importantly genes linked with heritable disease, particularly dominant disorders, tend to have undergone duplication in our early vertebrate ancestors, with a strong asymmetric relationship between disease-association within duplicate/paralog pairs. Using a novel phylogenetic approach, alongside a whole-genome comparative analysis, we show that contrary to the accepted compensatory model of disease evolution, the majority of disease-associations reside with the more evolutionary constrained gene, inferred to most closely resemble the progenitor. This indicates that the strong association between paralogs, specifically ohnologs, and dominant disorders is often a consequence of a mechanism through which pre-existing dosage sensitive/haploinsufficient genes are successfully duplicated and retained. Heritable disease is thus as much a consequence of the fragility of evolutionarily more ancient genes as compensatory mechanisms. From these findings, we demonstrate the utility of a new model with which to predict disease associated genes in the human genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.