Background
Missense mutations in the first five exons of
F9
, which encodes factor FIX, represent 40% of all mutations that cause hemophilia B. To address the ongoing debate regarding in silico identification of disease-causing mutations at these exons, we analyzed 215 missense mutations from
www.factorix.org
using six in silico prediction tools, which are the most common used programs for analysis prediction of impact of mutations on the protein structure and function, with further advantage of using similar approaches. We developed different algorithms to integrate multiple predictions from such tools. In order to approach a structural analysis on FIX we performed a modeling of five selected pathogenic mutations.
Results
SIFT, PolyPhen-2 HumDiv, SNAP2, and MutationAssessor were the most successful in identifying true non-causative and causative mutations. A proposed function integrating these algorithms (
wgP4
) was the most sensitive (90.1%), specific (22.6%), and accurate (87%) than similar functions, and identified 187 variants as deleterious. Clinical phenotype was significantly associated with predicted causative mutations at all five exons. However, PolyPhen-2 HumDiv was more successful in linking clinical severity to specific exons, while functions that integrate 4–6 predictions were more successful in linking phenotype to genotypes at the light chain (exons 3–5). The most important value of integrating multiple predictions is the inclusion of scores derived from different approaches. Modeling of protein structure showed the effects of pathogenic nsSNPs on structure and function of FIX.
Conclusions
A simple function that integrates information from different in silico programs yields the best prediction of mutated phenotypes. However, the specificity, sensitivity, and accuracy of genotype-phenotype predictions depend on specific characteristics of the protein domain and the disease of interest as we validated by the structural analysis of selected pathogenic
F9
mutations. The proposed function integrating algorithm (
wgP4
) might be useful for the analysis of nsSNPs impact on other genes.
Electronic supplementary material
The online version of this article (10.1186/s12859-019-2919-x) contains supplementary material, which is available to authorized users.