Copy number variants (CNVs) play an important role in many biological processes, including the development of genetic diseases, making them attractive targets for genetic analyses. The interpretation of the effect of these structural variants is a challenging problem due to highly variable numbers of gene, regulatory, or other genomic elements affected by the CNV. This led to the demand for the interpretation tools that would relieve researchers, laboratory diagnosticians, genetic counselors, and clinical geneticists from the laborious process of annotation and classification of CNVs. We designed and validated a prediction method (ISV; Interpretation of Structural Variants) that is based on boosted trees which takes into account annotations of CNVs from several publicly available databases. The presented approach achieved more than 98% prediction accuracy on both copy number loss and copy number gain variants while also allowing CNVs being assigned “uncertain” significance in predictions. We believe that ISV’s prediction capability and explainability have a great potential to guide users to more precise interpretations and classifications of CNVs.
Motivation: Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses, but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. Results: Here we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. Availability: WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr
Motivation Short tandem repeats (STRs) are regions of a genome containing many consecutive copies of the same short motif, possibly with small variations. Analysis of STRs has many clinical uses, but is limited by technology mainly due to STRs surpassing the used read length. Nanopore sequencing, as one of long read sequencing technologies, produces very long reads, thus offering more possibilities to study and analyze STRs. Basecalling of nanopore reads is however particularly unreliable in repeating regions, and therefore direct analysis from raw nanopore data is required. Results Here we present WarpSTR, a novel method for characterizing both simple and complex tandem repeats directly from raw nanopore signals using a finite-state automaton and a search algorithm analogous to dynamic time warping. By applying this approach to determine the lengths of 241 STRs, we demonstrate that our approach decreases the mean absolute error of the STR length estimate compared to basecalling and STRique. Availability WarpSTR is freely available at https://github.com/fmfi-compbio/warpstr Supplementary information Supplementary data are available at Bioinformatics online.
Copy number variants (CNVs) play important roles in many biological processes, including the development of genetic diseases, making them attractive targets for genetic analysis. This led to the demand for interpretation tools that would relieve researchers, laboratory diagnosticians, genetic counselors and clinical geneticists from the laborious process of annotation and classification of CNVs. Here we demonstrate that the prediction of the clinical impact of CNVs can be automated using modern machine learning methods applied to publicly available genomic annotations, requiring only basic input information about the genomic location and structural type (duplication/deletion) of the analyzed CNV. The presented approach achieved 0.95 prediction accuracy on deletions and 0.96 on duplications from the ClinVar dataset and therefore have a great potential to guide users to more precise conclusions.
Clinical interpretation of copy number variants (CNVs) is a complex process that requires skilled clinical professionals. General recommendations have been recently released to guide the CNV interpretation based on predefined criteria to uniform the decision process. Several semiautomatic computational methods have been proposed to recommend appropriate choices, relieving clinicians of tedious searching in vast genomic databases. We have developed and evaluated such a tool called MarCNV and tested it on CNV records collected from the ClinVar database. Alternatively, the emerging machine learning-based tools, such as the recently published ISV (Interpretation of Structural Variants), showed promising ways of even fully automated predictions using broader characterization of affected genomic elements. Such tools utilize features additional to ACMG criteria, thus providing supporting evidence and the potential to improve CNV classification. Since both approaches contribute to evaluation of CNVs clinical impact, we propose a combined solution in the form of a decision support tool based on automated ACMG guidelines (MarCNV) supplemented by a machine learning-based pathogenicity prediction (ISV) for the classification of CNVs. We provide evidence that such a combined approach is able to reduce the number of uncertain classifications and reveal potentially incorrect classifications using automated guidelines. CNV interpretation using MarCNV, ISV, and combined approach is available for non-commercial use at https://predict.genovisio.com/.
To explore a genomic pool of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) during the pandemic, the Ministry of Health of the Slovak Republic formed a genomics surveillance workgroup, and the Public Health Authority of the Slovak Republic launched a systematic national epidemiological surveillance using whole-genome sequencing (WGS). Six out of seven genomic centers implementing Illumina sequencing technology were involved in the national SARS-CoV-2 virus sequencing program. Here we analyze a total of 33,024 SARS-CoV-2 isolates collected from the Slovak population from 1 March 2021, to 31 March 2022, that were sequenced and analyzed in a consistent manner. Overall, 28,005 out of 30,793 successfully sequenced samples met the criteria to be deposited in the global GISAID database. During this period, we identified four variants of concern (VOC)—Alpha (B.1.1.7), Beta (B.1.351), Delta (B.1.617.2) and Omicron (B.1.1.529). In detail, we observed 165 lineages in our dataset, with dominating Alpha, Delta and Omicron in three major consecutive incidence waves. This study aims to describe the results of a routine but high-level SARS-CoV-2 genomic surveillance program. Our study of SARS-CoV-2 genomes in collaboration with the Public Health Authority of the Slovak Republic also helped to inform the public about the epidemiological situation during the pandemic.
Clinical interpretation of copy number variants (CNVs) is a complex process that requires skilled clinical professionals. General recommendations have been recently released to guide the CNV interpretation based on predefined criteria to uniform the decision process. Several semiautomatic computational methods have been proposed to recommend appropriate choices, relieving the clinicians from the tedious search in vast genomic databases. We have developed and evaluated such a tool called MarCNV and tested it on CNV records collected from the ClinVar database. Alternatively, the emerging machine learning-based tools, such as the recently published ISV (Interpretation of Structural Variants), showed promising ways of even fully automated predictions using wider characterization of affected genomic elements. Such tools utilize features that are additional to ACMG criteria, thus, they have the potential to significantly improve and/or provide supportive evidence for accurate CNV classification. Since both approaches contribute to evaluation of CNVs clinical impact, we propose a combined solution in the form of a decision support tool based on automated ACMG guidelines (MarCNV) supplemented by a machine learning-based pathogenicity prediction (ISV) for classification of CNVs. We provide evidence that such a combined approach is able to reduce the number of uncertain classifications and reveal potentially incorrect classifications using automated guidelines. CNV interpretation using MarCNV, ISV, and combined approach is available for non-commercial use at https://predict.genovisio.com/.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.