Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Chang, Zuling; Chrisnata, Johan; Ezerman, Martianus Frederic; Kiah, Han Mao

doi:10.1109/tit.2017.2747557

Cited by 29 publications

(33 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The specific idea of manipulating DNA molecules for data storage as been circulating the scientific community for a few decades, and yet it was not until 2012-2013 where two prototypes have been implemented [2], [7]. These prototypes have ignited the imagination of practitioners and theoreticians alike, and many works followed suit with various implementations and channel models [1], [6], [8], [9], [17], [21].…”

Section: Previous Workmentioning

confidence: 99%

“…The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is asymptotically equal to the amount required in the classical error correcting paradigm. 1 The edit distance between two strings is the minimum number of deletions, insertions, and substitutions that turn one to another. 2 As long as the number of insertions is not equal to the number of deletions, an event that occurs in negligible probability.…”

mentioning

confidence: 99%

See 1 more Smart Citation

On Coding Over Sliced Information

Sima

Raviv

Bruck

2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

The interest in channel models in which the data is sent as an unordered set of binary strings has increased lately, due to emerging applications in DNA storage, among others. In this paper we analyze the minimal redundancy of binary codes for this channel under substitution errors, and provide several constructions, some of which are shown to be asymptotically optimal. The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is asymptotically equal to the amount required in the classical error correcting paradigm. 1 The edit distance between two strings is the minimum number of deletions, insertions, and substitutions that turn one to another. 2 As long as the number of insertions is not equal to the number of deletions, an event that occurs in negligible probability.

show abstract

Section: Previous Workmentioning

confidence: 99%

mentioning

confidence: 99%

On Coding Over Sliced Information

Sima

Raviv

Bruck

2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

show abstract

“…No explicit encoding schemes are known for codes that achieve this rate. As noted in [6], |{M : ∃ x ∈ {0, 1} n , M L (x) = M }| is at most equal to the number of 2 L -compositions of n − L + 1, so that…”

Section: Notation and Preliminariesmentioning

confidence: 99%

“…The first instance of a coded sequence reconstruction problem was studied by Levenshtein [20], who posed the sequence reconstruction problem for strings drawn from an error-correcting codebook. Recently, a new form of coded reconstruction was introduced in [6], [12], [17], with the goal of performing string encodings that enable unique reconstruction based on substring multisets. The problem of interest is to identify efficient coding schemes that convert arbitrary input strings into strings that may be uniquely reconstructed given some predetermined substring and/or subsequence information.…”

Section: Introductionmentioning

confidence: 99%

Unique Reconstruction of Coded Strings From Multiset Substring Spectra

Gabrys

Milenković

2019

IEEE Trans. Inform. Theory

View full text Add to dashboard Cite

The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded string reconstruction from multiset substring spectra, where the strings are restricted to lie in some codebook. In particular, we consider binary codebooks that allow for unique string reconstruction and propose a new method, termed repeat replacement, to create the codebook. Our contributions include algorithmic solutions for repeat replacement and constructive redundancy bounds for the underlying coding schemes. We also consider extensions of the problem to noisy settings in which substrings are compromised by burst and random errors. The study is motivated by applications in DNA-based data storage systems that use high throughput readout sequencers.

show abstract

“…In order to ensure unique reconstruction, studies were made on reconstruction of encoded sequences [13], [21], [22]. One method that guarantees a unique reconstruction is to encode the information sequence to a codeword that does not contain any k-tuple more than once.…”

Section: Introductionmentioning

confidence: 99%

Repeat-Free Codes

Elishco

Gabrys

Médard

et al. 2019

2019 IEEE International Symposium on Information Theory (ISIT)

View full text Add to dashboard Cite

In this paper we consider the problem of encoding data into repeat-free sequences in which sequences are imposed to contain any k-tuple at most once (for predefined k). First, the capacity and redundancy of the repeat-free constraint are calculated. Then, an efficient algorithm, which uses a single bit of redundancy, is presented to encode length-n sequences for k " 2`2 logpnq. This algorithm is then improved to support any value of k of the form k " a logpnq, for 1 ă a, while its redundancy is opnq. We also calculate the capacity of repeat-free sequences when combined with local constraints which are given by a constrained system, and the capacity of multi-dimensional repeat-free codes.

show abstract

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Cited by 29 publications

References 29 publications

On Coding Over Sliced Information

On Coding Over Sliced Information

Unique Reconstruction of Coded Strings From Multiset Substring Spectra

Repeat-Free Codes

Contact Info

Product

Resources

About