Deep learning-based variant callers are becoming the standard and have achieved superior SNP calling performance using long reads. In this paper, we present Clair3, which makes the best of two major method categories: pile-up calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 ran faster than any of the other state-ofthe-art variant callers and performed the best, especially at lower coverage. Main TextThe rst preprint of DeepVariant 1 was released in late 2016, marking the beginning of the use of deep learning-based methods (DL methods) instead of traditional statistical methods for variant calling. Over the years, several DL methods have been developed. We are now witnessing a complete take-over, led by DeepVariant for short-read variant calling. Long-read variant calling, using Oxford Nanopore (ONT) data, on the other hand, has been dominated by DL-methods since the beginning, primarily owing to the di culty caused by its higher base error rate in general. Although the DL methods for short-read and longread have a lot in common, the problem of long-read variant calling is considered more di cult. This led to two major designs -using pileup or full-alignment as the input of the decision-making neural network -which are signi cantly different in both performance and speed. Long-read variant callers, including Clairvoyante 2 , Clair 3 , and Nanocaller 4 , are pileup-based, in which the read alignments are summarized into features and counts before being inputted into a variant calling network. PEPPER-Margin-DeepVariant 5 (PEPPER) is full alignment-based. The input to the DeepVariant variant calling network is kept with spatial information in the read alignments and is tens of times larger than the pileup inputs in terms of size. Medaka 6 is consensus-based; it uses pileup input to generate a diploid consensus in the rst iteration and two haploid consensuses in the second. The differences between the reference and consensuses are identi ed and combined into variants. These are all state-of-the-art algorithms; the pileup-based algorithms are usually superior in terms of time e ciency and the full-alignment algorithms provide the best precision and recall. However, while the two designs are not mutually exclusive, there have not been any studies combining pileup calling and full-alignment calling.To ll the gap, we developed Clair3, the successor to Clair, which makes the best of both designs. It runs as fast as the pileup-based callers and performs as well as the full alignment-based callers. Supplementary Figure 1 shows the work ow for Clair3. The philosophy behind Clair3 is to trust the fullalignment model unless the pileup model can make a quick but reliable decision. First, the pileup calling network goes through all the variant candidates that passed a coverage threshold and an alternative allele frequency threshold. Next, the high-quality pileup calls are used to phase the alignments and as part of the nal output. Then, ...
Deep learning-based variant callers are becoming the standard and have achieved superior SNP calling performance using long reads. In this paper, we present Clair3, which makes the best of two major method categories: pile-up calling handles most variant candidates with speed, and full-alignment tackles complicated candidates to maximize precision and recall. Clair3 ran faster than any of the other state-of-the-art variant callers and performed the best, especially at lower coverage.
Summary Circular consensus sequencing (CCS) reads are promising for the comprehensive detection of structural variants (SVs). However, alignment-based SV calling pipelines are computationally intensive due to the generation of complete read-alignments and its post-processing. Herein, we propose a SKeleton-based analysis toolkit for Structural Variation detection (SKSV). Benchmarks on real and simulated datasets demonstrate that SKSV has an order of magnitude of faster speed than state-of-the-art SV calling approaches, moreover, it achieves higher F1 scores for various types of SVs. Availability SKSV is available from https://github.com/ydLiu-HIT/SKSV. Supplementary information Supplementary data are available at Bioinformatics online.
Accurate identification of genetic variants from family child–mother–father trio sequencing data is important in genomics. However, state-of-the-art approaches treat variant calling from trios as three independent tasks, which limits their calling accuracy for Nanopore long-read sequencing data. For better trio variant calling, we introduce Clair3-Trio, the first variant caller tailored for family trio data from Nanopore long-reads. Clair3-Trio employs a Trio-to-Trio deep neural network model, which allows it to input the trio sequencing information and output all of the trio’s predicted variants within a single model to improve variant calling. We also present MCVLoss, a novel loss function tailor-made for variant calling in trios, leveraging the explicit encoding of the Mendelian inheritance. Clair3-Trio showed comprehensive improvement in experiments. It predicted far fewer Mendelian inheritance violation variations than current state-of-the-art methods. We also demonstrated that our Trio-to-Trio model is more accurate than competing architectures. Clair3-Trio is accessible as a free, open-source project at https://github.com/HKU-BAL/Clair3-Trio.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.