Background
An emerging standard-of-care for long QT syndrome (LQTS) employs clinical genetic testing to identify genetic variants of the KCNQ1 potassium channel. However, interpreting results from genetic testing is confounded by the presence of “variants of unknown significance” (VUS) for which there is inadequate evidence of pathogenicity.
Methods and Results
In this study, we curated from the literature a “high-quality” set of 107 functionally characterized KCNQ1 variants. Based on this dataset, we completed a detailed quantitative analysis on the sequence conservation patterns of subdomains of KCNQ1 and the distribution of pathogenic variants therein. We found that conserved subdomains generally are critical for channel function and are enriched with dysfunctional variants. Using this experimentally validated dataset, we trained a neural network, designated Q1VarPred, specifically for predicting the functional impact of KCNQ1 VUS. The estimated predictive performance of Q1VarPred in terms of Matthew’s correlation coefficient and area under the receiver operating characteristic curve were 0.581 and 0.884, respectively, superior to the performance of eight previous methods tested in parallel. Q1VarPred is publicly available as a web server at http://meilerlab.org/q1varpred.
Conclusions
Although a plethora of tools are available for making pathogenicity predictions over a genome-wide scale, previous tools fail to perform in a robust manner when applied to KCNQ1. The contrasting and favorable results for Q1VarPred suggests a promising approach, where a machine learning algorithm is tailored to a specific protein target and trained with a functionally validated dataset to calibrate informatics tools.