2020
DOI: 10.1093/bioinformatics/btaa1017
|View full text |Cite
|
Sign up to set email alerts
|

Impact of lossy compression of nanopore raw signal data on basecalling and consensus accuracy

Abstract: Motivation Nanopore sequencing provides a real-time and portable solution to genomic sequencing, enabling better assembly, structural variant discovery and modified base detection than second generation technologies. The sequencing process generates a huge amount of data in the form of raw signal contained in fast5 files, which must be compressed to enable efficient storage and transfer. Since the raw data is inherently noisy, lossy compression has potential to significantly reduce space requ… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 22 publications
0
1
0
Order By: Relevance
“…Currently, incremental updates to the ONT base-calling algorithm regularly improve the read accuracy 49 , which suggests that repeating the base calling of older data is valuable. This reanalysis requires long-term storage of the fast5 files, which can be up to 1.5 TB for a single PromethION flow cell, although further compression is possible 50 . By contrast, the PacBio base-calling process is highly mature, and BAM files containing unaligned reads are produced directly from the sequencing machine.…”
Section: Sequencing Logisticsmentioning
confidence: 99%
“…Currently, incremental updates to the ONT base-calling algorithm regularly improve the read accuracy 49 , which suggests that repeating the base calling of older data is valuable. This reanalysis requires long-term storage of the fast5 files, which can be up to 1.5 TB for a single PromethION flow cell, although further compression is possible 50 . By contrast, the PacBio base-calling process is highly mature, and BAM files containing unaligned reads are produced directly from the sequencing machine.…”
Section: Sequencing Logisticsmentioning
confidence: 99%
“…To date, one of the primary challenges associated with ONT's technology has been its relatively high basecalling error rate. Even with the recent introduction of basecallers based on deep learning algorithms, the error rate has decreased to a median value of approximately 5% [33]. Nanopore sequencing can be enhanced through the incorporation of complementary short-read data for error correction, particularly in regions with high GC content or repetitive sequences.…”
Section: Third-generation Sequencingmentioning
confidence: 99%
“…On the other hand, nanopore reads are much longer (often over hundreds of thousands of bases long), and have a much higher error rate, including substitution, insertion, and deletion errors from the basecalling process that converts the raw current signal to the read sequences 4 . However, the error rate has fallen dramatically in the recent years with the advent of deep learning based basecallers which achieve median error rate close to 5% or better 5 , suggesting that a similar approximate assembly approach with some adaptations can be applied to nanopore sequencing reads.…”
Section: Introductionmentioning
confidence: 99%