Motivation
Improvements in next-generation sequencing have enabled genome-based diagnosis for patients with genetic diseases. However, accurate interpretation of human variants requires knowledge from a number of clinical cases. Additionally, manual analysis of each variant detected in a patient's genome requires enormous time and effort. To reduce the cost of diagnosis, various computational tools have been developed to predict the pathogenicity of human variants, but the shortage and bias of available clinical data can lead to overfitting of algorithms.
Results
We developed a pathogenicity predictor, 3Cnet, that uses recurrent neural networks to analyse the amino acid context of human variants. As 3Cnet is trained on simulated variants reflecting evolutionary conservation and clinical data, it can find disease-causing variants in patient genomes with 2.2 times greater sensitivity than currently available tools, more effectively discovering pathogenic variants and thereby improving diagnosis rates.
Availability
Codes (https://github.com/KyoungYeulLee/3Cnet/) and data (https://zenodo.org/record/4716879#.YIO-xqkzZH1) are freely available to non-commercial users.
Supplementary information
Supplementary data are available at Bioinformatics online.
2
AbstractsPurpose: We developed an automated interpretation system for the whole process of Whole exome sequencing (WES) including raw data processing, variant calling, variant interpretation, and measurement of phenotypic similarity between the patient and each disease. This study was to investigate diagnostic yield and clinical utility of our new system that assists clinicians with diagnosis of patients with suspected genetic disorders.
Methods: WES was performed a total of 194 patients (age range 0-68 years) with suspected genetic disorder. The patient inclusion criteria were delayed development within age of 5 months, multiple congenital anomalies with dysmorphic features, strongly suggestive features of monogenic disorder or genetically heterogeneous disorder, or not diagnosed despite performing genetic investigation. Results: WES reported 180 variants, of which 110 variants were confirmed by segregation analysis and 94 patients (48.4%) were diagnosed with 89 genetic disorders. There was no difference of diagnostic rate (48.9 %, 71/145 vs. 46.9%, 23/49, P > 0.05) and duration of the diagnostic odyssey (2.8 ± 3.3 vs. 4.1 ± 5.1, P= 0.293) between group with and without genetic test before WES. There was no significant difference in the distribution of clinical symptoms between the patients who were diagnosed with and without genetic disorder. Forty four percent of total patients filled only 9% of total symptom principal component analysis (PCA) space, and the remaining 56% of patients filled the other 91% of symptom PCA space. The two groups had similar genetic variant diversities (P = 0.899).
ConclusionThis study showed improved diagnostic yield (48.4%) in patients with clinical heterogeneity by using automating variant interpretation. Diverse genetic variations were also observed in patients with similar symptoms. This study highlights the utility of automated 3 interpretation system of WES to clarify differential diagnosis in patients with suspected genetic disorder.
The American College of Medical Genetics (ACMG) and Genomics/Association for Molecular Pathology (AMP) previously reported standardized guidance for the assessment of genetic variants. One of the criteria regarding the prevalence in a case-control study, PS4, is important due to its evidence of pathogenicity. Despite recent studies approaching gene- and disease-specific probands, interpretation of a variant to PS4 still has certain limitations for rare variants. Here, we suggest a generalized method, Bayesian odds ratio (BayesianOR), applicable to PS4 via decomposing a disease to its symptoms and applying a Bayesian framework. Using this approach, we demonstrate reproducibility of the calculation of the original odds ratio from well-studied epilepsy data and verify the applicability to in-house frequencies for various rare diseases. In addition, BayesianOR showed a significant difference in tendency with different ClinVar pathogenicity, using in-house data. Thus, the novel method described here should provide an improved interpretation of sequence variants. Furthermore, we anticipate that it will enhance the diagnosis of patients with rare diseases.
Thanks to the improvement of New Generation Sequencing (NGS), genome-based diagnosis for rare disease patients become possible. However, accurate interpretation of human variants requires massive amount of knowledge gathered from previous researches and clinical cases. Also, manual analysis for each variant in the genome of patients takes enormous time and effort of clinical experts and medical doctors. Therefore, to reduce the cost of diagnosis, various computational tools have been developed for the pathogenicity prediction of human variants. Nevertheless, there has been the circularity problem of conventional tools, which leads to the overlap of training data and eventually causes overfitting of algorithms. In this research, we developed a pathogenicity predictor, named as 3Cnet, using deep recurrent neural networks which analyzes the amino-acid context of a missense mutation. 3Cnet utilizes knowledge transfer of evolutionary conservation to train insufficient clinical data without overfitting. The performance comparison clearly shows that 3Cnet can find the true disease-causing variant from a large number of missense variants in the genome of a patient with higher sensitivity (recall = 13.9 %) compared to other prediction tools such as REVEL (recall = 7.5 %) or PrimateAI (recall = 6.4 %). Consequently, 3Cnet can improve the diagnostic rate for patients and discover novel pathogenic variants with high probability.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.