“…‚ Deriving lower bounds on G s pkq. ‚ Finding the number of distinct s-gapped k-decks, akin to what was done [7]. ‚ Extending the results on coded and hybrid k-decks in [9] for gapped k-decks.…”
Section: Improved Upper Bounds For Gapped K-decksmentioning
confidence: 99%
“…The problem of reconstructing strings based on evidence sets of the form of subsequences, substrings or weights of substrings has received significant attention from the theoretical computer science, bioinformatics, and information theory communities alike [1], [3], [4], [8], [10], [11], [13], [15]. One special instance of this class of problems is the k-deck problem [4], [6], [7], [9], [10], [14], of interest due to its connection to trace reconstruction [3], [5] and its applications in DNA-based data storage [16].…”
The k-deck problem is concerned with finding the smallest value Spkq of a positive integer n such that there exist at least two strings of length n that share the same k-deck, i.e., the same multiset of subsequences of length k. We introduce the new problem of gapped k-deck reconstruction: For a given gap parameter s, we seek the smallest possible value of n, G s pkq, such that there exist at least two distinct strings of length n that cannot be distinguished based on a "gapped" set of ksubsequences. The gap constraint requires the elements in the subsequences to be at least s positions apart in the original string. Our results are as follows. First, we show how to construct sequences sharing the same 2-gapped k-deck using a nontrivial modification of the recursive Morse-Thue string construction procedure. This establishes the first known constructive upper bound on G 2 pkq. Second, we further improve this upper bound in a nonconstructive manner using the approach by Dudik and Schulman [6]. Third, we comment on the general case s ě 2 and present a number of open problems.
“…‚ Deriving lower bounds on G s pkq. ‚ Finding the number of distinct s-gapped k-decks, akin to what was done [7]. ‚ Extending the results on coded and hybrid k-decks in [9] for gapped k-decks.…”
Section: Improved Upper Bounds For Gapped K-decksmentioning
confidence: 99%
“…The problem of reconstructing strings based on evidence sets of the form of subsequences, substrings or weights of substrings has received significant attention from the theoretical computer science, bioinformatics, and information theory communities alike [1], [3], [4], [8], [10], [11], [13], [15]. One special instance of this class of problems is the k-deck problem [4], [6], [7], [9], [10], [14], of interest due to its connection to trace reconstruction [3], [5] and its applications in DNA-based data storage [16].…”
The k-deck problem is concerned with finding the smallest value Spkq of a positive integer n such that there exist at least two strings of length n that share the same k-deck, i.e., the same multiset of subsequences of length k. We introduce the new problem of gapped k-deck reconstruction: For a given gap parameter s, we seek the smallest possible value of n, G s pkq, such that there exist at least two distinct strings of length n that cannot be distinguished based on a "gapped" set of ksubsequences. The gap constraint requires the elements in the subsequences to be at least s positions apart in the original string. Our results are as follows. First, we show how to construct sequences sharing the same 2-gapped k-deck using a nontrivial modification of the recursive Morse-Thue string construction procedure. This establishes the first known constructive upper bound on G 2 pkq. Second, we further improve this upper bound in a nonconstructive manner using the approach by Dudik and Schulman [6]. Third, we comment on the general case s ě 2 and present a number of open problems.
“…Then X {3,5,6} = X {4,5,6} = 011 and we check that X 011 = 2. Furthermore, (1,4,4,6), and D 3 (X) = (X 000 , X 001 , X 010 , X 011 , X 100 , X 101 , X 110 , X 111 ) = (0, 2, 0, 2, 2, 8, 2, 4).…”
Section: Problem Statement and Contributionsmentioning
<p style='text-indent:20px;'>The <i><inline-formula><tex-math id="M2">\begin{document}$ k $\end{document}</tex-math></inline-formula>-deck</i> of a sequence is defined as the multiset of all its subsequences of length <inline-formula><tex-math id="M3">\begin{document}$ k $\end{document}</tex-math></inline-formula>. Let <inline-formula><tex-math id="M4">\begin{document}$ D_k(n) $\end{document}</tex-math></inline-formula> denote the number of distinct <inline-formula><tex-math id="M5">\begin{document}$ k $\end{document}</tex-math></inline-formula>-decks for binary sequences of length <inline-formula><tex-math id="M6">\begin{document}$ n $\end{document}</tex-math></inline-formula>. For binary alphabet, we determine the exact value of <inline-formula><tex-math id="M7">\begin{document}$ D_k(n) $\end{document}</tex-math></inline-formula> for small values of <inline-formula><tex-math id="M8">\begin{document}$ k $\end{document}</tex-math></inline-formula> and <inline-formula><tex-math id="M9">\begin{document}$ n $\end{document}</tex-math></inline-formula>, and provide asymptotic estimates of <inline-formula><tex-math id="M10">\begin{document}$ D_k(n) $\end{document}</tex-math></inline-formula> when <inline-formula><tex-math id="M11">\begin{document}$ k $\end{document}</tex-math></inline-formula> is fixed.</p><p style='text-indent:20px;'>Specifically, for fixed <inline-formula><tex-math id="M12">\begin{document}$ k $\end{document}</tex-math></inline-formula>, we introduce a trellis-based method to compute <inline-formula><tex-math id="M13">\begin{document}$ D_k(n) $\end{document}</tex-math></inline-formula> in time polynomial in <inline-formula><tex-math id="M14">\begin{document}$ n $\end{document}</tex-math></inline-formula>. We then compute <inline-formula><tex-math id="M15">\begin{document}$ D_k(n) $\end{document}</tex-math></inline-formula> for <inline-formula><tex-math id="M16">\begin{document}$ k \in \{3,4,5,6\} $\end{document}</tex-math></inline-formula> and <inline-formula><tex-math id="M17">\begin{document}$ k \leqslant n \leqslant 30 $\end{document}</tex-math></inline-formula>. We also improve the asymptotic upper bound on <inline-formula><tex-math id="M18">\begin{document}$ D_k(n) $\end{document}</tex-math></inline-formula>, and provide a lower bound thereupon. In particular, for binary alphabet, we show that <inline-formula><tex-math id="M19">\begin{document}$ D_k(n) = O\bigl(n^{(k-1)2^{k-1}+1}\bigr) $\end{document}</tex-math></inline-formula> and <inline-formula><tex-math id="M20">\begin{document}$ D_k(n) = \Omega(n^k) $\end{document}</tex-math></inline-formula>. For <inline-formula><tex-math id="M21">\begin{document}$ k = 3 $\end{document}</tex-math></inline-formula>, we moreover show that <inline-formula><tex-math id="M22">\begin{document}$ D_3(n) = \Omega(n^6) $\end{document}</tex-math></inline-formula> while the upper bound on <inline-formula><tex-math id="M23">\begin{document}$ D_3(n) $\end{document}</tex-math></inline-formula> is <inline-formula><tex-math id="M24">\begin{document}$ O(n^9) $\end{document}</tex-math></inline-formula>.</p>
“…String reconstruction refers to a large class of problems where information about a string can only be obtained in the form of multiple, incomplete and/or noisy observations. Examples of such problems are the reconstruction problem by Levenshtein [14], the trace reconstruction problem [3], and the k-deck problem [6], [7], [16], [24].…”
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of some fixed length are read or substrings are read with no overlap, this work considers the setup in which consecutive substrings are read with some given minimum overlap. First, upper bounds are provided on the attainable rates of codes that guarantee unique reconstruction. Then, we present efficient constructions of asymptotically optimal codes that meet the upper bound.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.