2016 Data Compression Conference (DCC) 2016
DOI: 10.1109/dcc.2016.39
|View full text |Cite
|
Sign up to set email alerts
|

An Evaluation Framework for Lossy Compression of Genome Sequencing Quality Values

Abstract: This paper provides the specification and an initial validation of an evaluation framework for the comparison of lossy compressors of genome sequencing quality values. The goal is to define reference data, test sets, tools and metrics that shall be used to evaluate the impact of lossy compression of quality values on human genome variant calling. The functionality of the framework is validated referring to two state-of-the-art genomic compressors. This work has been spurred by the current activity within the I… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
13
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
1
1

Relationship

5
2

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 12 publications
(16 reference statements)
1
13
0
Order By: Relevance
“…For this individual, the National Institute of Standards and Technology (NIST) released a consensus set of variants which we used for our analyses [24]. Note that similar analyses were conducted in other works [17,1,21]. The selected data sets are shown in Table 2.…”
Section: Resultsmentioning
confidence: 99%
“…For this individual, the National Institute of Standards and Technology (NIST) released a consensus set of variants which we used for our analyses [24]. Note that similar analyses were conducted in other works [17,1,21]. The selected data sets are shown in Table 2.…”
Section: Resultsmentioning
confidence: 99%
“…When it comes to aligned data, an MPEG-G encoder could use a compression method comparable to that of DeeZ [8], which is able to compress a 437 GB H. Sapiens SAM file to about 63 GB, as compared to 75 GB by CRAM (Scramble) or 106 GB by BAM [8]. Regarding quantization of quality values, methods like QVZ [9] and CALQ [10] could be applied yielding overall compression gains of 10x over BAM, while preserving, or even improving, variant calling performance [11].…”
Section: Compression Capabilitiesmentioning
confidence: 99%
“…Due to their higher entropy and larger alphabet, quality values have proven more difficult to compress than the reads [16,11]. In addition, there is evidence that quality values are inherently noisy, and downstream applications that use them do so in varying heuristic manners.…”
Section: Compression Modes For Quality Valuesmentioning
confidence: 99%
See 1 more Smart Citation
“…In recent years several research groups have investigated methods to improve the effectiveness of compression technologies for the storage of high-throughput sequencing data. In particular, approaches to lossy or quasi-lossless compression of quality scores have received special attention [10][11][12][13], along with an interest to measure their impact in the calling of genomic variants [14,15], so far the sole downstream application tested for evaluation. In the context of gene expression (section 2) this work sets out to explore the effect of lossy compression of quality scores.…”
Section: Introductionmentioning
confidence: 99%