2017
DOI: 10.1109/tit.2017.2747557
|View full text |Cite
|
Sign up to set email alerts
|

Rates of DNA Sequence Profiles for Practical Values of Read Lengths

Abstract: A recent study by one of the authors has demonstrated the importance of profile vectors in DNA-based data storage. We provide exact values and lower bounds on the number of profile vectors for finite values of alphabet size q, read length , and word length n. Consequently, we demonstrate that for q ≥ 2 and n ≤ q /2−1 , the number of profile vectors is at least q κn with κ very close to 1. In addition to enumeration results, we provide a set of efficient encoding and decoding algorithms for each of two particul… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
33
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 29 publications
(33 citation statements)
references
References 29 publications
0
33
0
Order By: Relevance
“…The specific idea of manipulating DNA molecules for data storage as been circulating the scientific community for a few decades, and yet it was not until 2012-2013 where two prototypes have been implemented [2], [7]. These prototypes have ignited the imagination of practitioners and theoreticians alike, and many works followed suit with various implementations and channel models [1], [6], [8], [9], [17], [21].…”
Section: Previous Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The specific idea of manipulating DNA molecules for data storage as been circulating the scientific community for a few decades, and yet it was not until 2012-2013 where two prototypes have been implemented [2], [7]. These prototypes have ignited the imagination of practitioners and theoreticians alike, and many works followed suit with various implementations and channel models [1], [6], [8], [9], [17], [21].…”
Section: Previous Workmentioning
confidence: 99%
“…The surprising result in this paper is that while the information vector is sliced into a set of unordered strings, the amount of redundant bits that are required to correct errors is asymptotically equal to the amount required in the classical error correcting paradigm. 1 The edit distance between two strings is the minimum number of deletions, insertions, and substitutions that turn one to another. 2 As long as the number of insertions is not equal to the number of deletions, an event that occurs in negligible probability.…”
mentioning
confidence: 99%
“…No explicit encoding schemes are known for codes that achieve this rate. As noted in [6], |{M : ∃ x ∈ {0, 1} n , M L (x) = M }| is at most equal to the number of 2 L -compositions of n − L + 1, so that…”
Section: Notation and Preliminariesmentioning
confidence: 99%
“…The first instance of a coded sequence reconstruction problem was studied by Levenshtein [20], who posed the sequence reconstruction problem for strings drawn from an error-correcting codebook. Recently, a new form of coded reconstruction was introduced in [6], [12], [17], with the goal of performing string encodings that enable unique reconstruction based on substring multisets. The problem of interest is to identify efficient coding schemes that convert arbitrary input strings into strings that may be uniquely reconstructed given some predetermined substring and/or subsequence information.…”
Section: Introductionmentioning
confidence: 99%
“…In order to ensure unique reconstruction, studies were made on reconstruction of encoded sequences [13], [21], [22]. One method that guarantees a unique reconstruction is to encode the information sequence to a codeword that does not contain any k-tuple more than once.…”
Section: Introductionmentioning
confidence: 99%