2016
DOI: 10.1093/nar/gkw540
|View full text |Cite
|
Sign up to set email alerts
|

Boiler: lossy compression of RNA-seq alignments using coverage vectors

Abstract: We describe Boiler, a new software tool for compressing and querying large collections of RNA-seq alignments. Boiler discards most per-read data, keeping only a genomic coverage vector plus a few empirical distributions summarizing the alignments. Since most per-read data is discarded, storage footprint is often much smaller than that achieved by other compression tools. Despite this, the most relevant per-read data can be recovered; we show that Boiler compression has only a slight negative impact on results … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 24 publications
(28 reference statements)
0
2
0
Order By: Relevance
“…This conversion (which could be applied as a preprocessing step to all reads) works well especially if the insert size (the distance between between the two ends) has limited variation across the reads. Additional ideas for handling paired-end reads are discussed in, for instance 29 .…”
Section: Methodsmentioning
confidence: 99%
“…This conversion (which could be applied as a preprocessing step to all reads) works well especially if the insert size (the distance between between the two ends) has limited variation across the reads. Additional ideas for handling paired-end reads are discussed in, for instance 29 .…”
Section: Methodsmentioning
confidence: 99%
“…These BAM files are extremely large and require sophisticated storage solutions. This is because the format stores data on a per-read basis and the space requirement grows almost linearly with the number of reads [55]. A BAM file for a 30x whole genome requires about 80-90 gigabytes of storage.…”
Section: Introductionmentioning
confidence: 99%