2020
DOI: 10.1109/tit.2019.2961265
|View full text |Cite
|
Sign up to set email alerts
|

Coding Over Sets for DNA Storage

Abstract: In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this stor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
62
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 81 publications
(62 citation statements)
references
References 39 publications
0
62
0
Order By: Relevance
“…The improved insertion and deletion correction can extend the applicability of the framework to sequencing platforms such as nanopore sequencing [28] which have higher insertion and deletion error rates. Another interesting direction is to incorporate ideas from [18] and [29] to reduce the inefficiency of index error correction.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The improved insertion and deletion correction can extend the applicability of the framework to sequencing platforms such as nanopore sequencing [28] which have higher insertion and deletion error rates. Another interesting direction is to incorporate ideas from [18] and [29] to reduce the inefficiency of index error correction.…”
Section: Discussionmentioning
confidence: 99%
“…In this section, we consider a simplified model for DNAbased storage to develop a better understanding of the coding theoretic tradeoffs. While several previous works such as [15], [16], [18] theoretically analyze various aspects of the DNA-based storage problem (such as the information-theoretic capacity in the asymptotic setting and the optimality of various techniques to recover the order of the oligonucleotides), our main focus is to understand the tradeoff between the writing and reading cost associated with DNA-based storage and to motivate the scheme described in Section 3.…”
Section: Theoretical Analysismentioning
confidence: 99%
“…3(c)). Reed-Solomon outer code: We use a Reed-Solomon (RS) code with field size 2 16 as the outer code to recover lost sequences and to correct any errors left undetected by the CRC. The amount of additional RS redundancy can be chosen to tradeoff the writing and reading costs [12], and is set to 30% by default.…”
Section: Methodsmentioning
confidence: 99%
“…Recent works have examined various aspects of DNA storage, including error correction [1,5,10,11,12], random access [4,5,13], novel synthesis techniques [14,15] and analysis of the fundamental limits [16,17,18]. While initial works used Illumina sequencing which provides highly accurate short reads, there is growing interest in the use of nanopore sequencing [19] because it is a portable, real-time and low-cost platform that also supports long reads.…”
Section: Introductionmentioning
confidence: 99%
“…Related literature: Motivated by DNA-based storage, a few recent works have considered the problem of coding across an unordered set of strings [15][16][17][18]. The setting studied in all these works bears similarities with the one in this paper, but they focus on providing explicit code constructions, as opposed to characterizing the channel capacity, as we do here.…”
Section: Introductionmentioning
confidence: 99%