Scaling up DNA data storage and random access retrieval

Organick, Lee; Ang, Siena Dumas; Chen, Yuan Jyue; Lopez, Randolph; Yekhanin, Sergey; Makarychev, Konstantin; Rácz, Miklós Z.; Kamath, Govinda M.; Gopalan, Parikshit; Nguyen, Bichlien H.; Takahashi, Christopher N.; Newman, Sharon; Parker, Hsing Yeh; Rashtchian, Cyrus; Stewart, Kendall; Gupta, Gagan; Carlson, Robert H.; Mulligan, John; Carmean, Douglas M.; Seelig, Georg; Ceze, Luís; Strauß, Karin

doi:10.1101/114553

Cited by 34 publications

(48 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We denote by s the maximum number of sequences that are never drawn (or their clusters are not identified), by t the maximum number of sequences, which have been reconstructed with errors with a maximum of ǫ errors of type E each. Typical error types E after the reconstruction step are insertions, deletions and substitutions, where the latter two are the most prominent ones in DNA storage systems [5]. To be more precise, we define the error balls associated with the channel model.…”

Section: B Dna Channel Modelmentioning

confidence: 99%

“…One way to address this problem is using block addresses, also called indices, that are stored as part of the strand. Errors in DNA are typically substitutions, insertions, and deletions, where most published studies report that either substitutions or deletions are the most prominent ones, depending upon the specific technology for synthesis and sequencing [2], [3], [4], [5], [6], [7]. For example, in column-based DNA oligo synthesis the dominant errors are deletions that result from either failure to remove the dimethoxytrityl (DMT) or combined inefficiencies in the coupling and capping steps [4].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Coding Over Sets for DNA Storage

Lenz

Siegel

Wachter-Zeh

et al. 2020

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.Index Terms-coding over sets, DNA data storage, Gilbert-Varshamov bound, insertion and deletion errors, sphere packing bound

show abstract

Section: B Dna Channel Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Coding Over Sets for DNA Storage

Lenz

Siegel

Wachter-Zeh

et al. 2020

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

show abstract

“…However, as mentioned, such a problem can be easily resolved using data storage on DNA-carriers. Using synthetic DNA for the storage of numerical information makes it possible to achieve potential information density of 10 9 Gb/mm 3 at the potential durability of thousands of years (6). Thus, the researchers of Microsoft Corporation (Microsoft Research) in co-operation with scientists from University of Washington have stored more than 200 Mb numerical data in the form of DNA (4,6,7,8,9).…”

Section: Introductionmentioning

confidence: 99%

“…Using synthetic DNA for the storage of numerical information makes it possible to achieve potential information density of 10 9 Gb/mm 3 at the potential durability of thousands of years (6). Thus, the researchers of Microsoft Corporation (Microsoft Research) in co-operation with scientists from University of Washington have stored more than 200 Mb numerical data in the form of DNA (4,6,7,8,9). In particular, these data contain the encoding of high definition video (10), copies of the Universal Declaration of Human Rights in different languages, the top 100 books from Project Gutenberg, and the Crop Trust seed database (6).…”

Section: Introductionmentioning

confidence: 99%

Speckle-interferometry and speckle-correlometry of GB-speckles

Ulyanov¹

2019

Front Biosci

View full text Add to dashboard Cite

Introduction 3. Transformation of sequence of nucleotides in gene-based speckle pattern 3.1. Algorithm of re-coding of a nucleotide sequence 3.2. Algorithm of generating of 2D speckle pattern, based on a nucleotide sequence 3.3. Generating of gene-based speckles 4. Comparison of GB-speckles, based on similar genovars: cross-correlation technique 5. Comparison of GB-speckles, based on similar С. trachomatis genovars of different subtypes: speckle-interferometry 6. Optical processing of GB-speckles: detection of genetic mutations in a gene of microorganisms 7. Conclusions 8. Acknowledgment 9. References

show abstract

“…Since then, several more groups have demonstrated the ability to successfully store data of large scale using DNA molecules; see e.g. [1], [2], [5], [13], [18]. Other works developed coding solutions which are specifically targeted to correct the special types of errors inside DNA-based storage systems [10]- [12], [14], [16]- [18].…”

Section: Introductionmentioning

confidence: 99%

Clustering-Correcting Codes

Shinkar

Yaakobi

Lenz

et al. 2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

A new family of codes, called clustering-correcting codes, is presented in this paper. This family of codes is motivated by the special structure of data that is stored in DNA-based storage systems. The data stored in these systems has the form of unordered sequences, also called strands, and every strand is synthesized thousands to millions of times, where some of these copies are read back during sequencing. Due to the unordered structure of the strands, an important task in the decoding process is to place them in their correct order. This is usually accomplished by allocating a part of the strand for an index. However, in the presence of errors in the index field, important information on the order of the strands may be lost.Clustering-correcting codes ensure that if the distance between the index fields of two strands is small, then there will be a large distance between their data fields. It is shown how this property enables to place the strands together in their correct clusters even in the presence of errors. We present lower and upper bounds on the size of clustering-correcting codes and an explicit construction of these codes which uses only a single bit of redundancy.

show abstract

Scaling up DNA data storage and random access retrieval

Cited by 34 publications

References 12 publications

Coding Over Sets for DNA Storage

Coding Over Sets for DNA Storage

Speckle-interferometry and speckle-correlometry of GB-speckles

Clustering-Correcting Codes

Contact Info

Product

Resources

About