2017
DOI: 10.1109/tcbb.2016.2568186
|View full text |Cite
|
Sign up to set email alerts
|

Benchmark Dataset for Whole Genome Sequence Compression

Abstract: The sample dataset and the respective links are available @ https://sourceforge.net/projects/benchmarkdnacompressiondataset/.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
5
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 25 publications
0
5
0
Order By: Relevance
“…The life sciences are becoming "big data companies," and that is setting the standard for addressing that storage problem in the scientific community [43]. Scientists have over the past decade needed the storage space provided by genomic data, yet in the future brain data that is equivalent to world digital information would be difficult to manage [44], [45]. Therefore, there is a need for a modern, efficient approach that will resolve all the challenges of genomic data such as storage space, fast processing, and system throughput [15], [45].…”
Section: Fig 2 Compression Of Input Data Using Cartesian Producmentioning
confidence: 99%
See 1 more Smart Citation
“…The life sciences are becoming "big data companies," and that is setting the standard for addressing that storage problem in the scientific community [43]. Scientists have over the past decade needed the storage space provided by genomic data, yet in the future brain data that is equivalent to world digital information would be difficult to manage [44], [45]. Therefore, there is a need for a modern, efficient approach that will resolve all the challenges of genomic data such as storage space, fast processing, and system throughput [15], [45].…”
Section: Fig 2 Compression Of Input Data Using Cartesian Producmentioning
confidence: 99%
“…Scientists have over the past decade needed the storage space provided by genomic data, yet in the future brain data that is equivalent to world digital information would be difficult to manage [44], [45]. Therefore, there is a need for a modern, efficient approach that will resolve all the challenges of genomic data such as storage space, fast processing, and system throughput [15], [45]. Figure 4 depicts the decryption of the encrypted data which is reverse process of encryption.…”
Section: Fig 2 Compression Of Input Data Using Cartesian Producmentioning
confidence: 99%
“…Dataset: Standardizing the dataset to be used in the compression tests is one of the main points to ensure that the results achieved by compression tools are comparable. We found in the literature two authors who published datasets for genome compression benchmarking [110,111].…”
Section: Criteria Suggestion For Evaluating Genome Compression Toolsmentioning
confidence: 99%
“…The dataset proposed by [111] consists of sequences from several different organisms, with 1105 prokaryotes, 200 plasmids, 164 viruses, and 65 eukaryotes. Furthermore, the author of this work found a scientific way to select samples for compiling the dataset for the benchmark, using multi-stage sampling strategies.…”
Section: Criteria Suggestion For Evaluating Genome Compression Toolsmentioning
confidence: 99%
“…And will the resulting gains be even worth the trouble of switching? Previous attempts at answering these questions (Zhu et al, 2013;Hosseini et al, 2016;Sardaraz and Tahir, 2016;Biji and Achuthsankar, 2017) are limited by testing too few compressors and by using restricted test data. Therefore we set out to benchmark a broader selection of available compressors on a variety of relevant test data.…”
mentioning
confidence: 99%