Next-generation sequencing led to the identification of many potential novel disease genes. The presence of mutations in the same gene in multiple unrelated patients is, however, a priori insufficient to establish that these genes are truly involved in the respective disease. Here, we show how phenotype information can be incorporated within statistical approaches to provide additional evidence for the causality of mutations. We developed a broadly applicable statistical model that integrates gene-specific mutation rates, cohort size, mutation type, and phenotype frequency information to assess the chance of identifying de novo mutations affecting the same gene in multiple patients with shared phenotype features. We demonstrate our approach based on the frequency of phenotype features present in a unique cohort of 6,149 patients with intellectual disability. We show that our combined approach can decrease the number of patients required to identify novel disease genes, especially for patients with combinations of rare phenotypes. In conclusion, we show how integrating genotype-phenotype information can aid significantly in the interpretation of de novo mutations in potential novel disease genes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.