Abstract:We develop and study the concept of similarity functions for q-ary sequences. For the case q = 4, these functions can be used for a mathematical model of the DNA duplex energy [1,2], which has a number of applications in molecular biology. Based on these similarity functions, we define a concept of DNA codes [1]. We give brief proofs for some of our unpublished results [3] connected with the well-known deletion similarity function [4][5][6]. This function is the length of the longest common subsequence; it is … Show more
“…Identity (5) implies the symmetry property of hybridization energy between DNA sequences x and y [7]- [13]. Example 1: In [18] we considered constant weights w = w(a, b) ≡ 1, a, b ∈ A, for which the additive stem 1-similarity S 1 (x, y), 0 ≤ S 1 (x, y) ≤ S 1 (x, x) = n − 1, is the abovementioned number of stems in the longest common Hamming subsequence between x and y.…”
Section: A Notations and Definitionsmentioning
confidence: 99%
“…. x n ) are identically distributed in accordance with the Markov chain (13) and, in virtue of (11), the corresponding reverse complement codewords x = (x nxn−1 . .…”
Section: Bounds On Rate R W (D)mentioning
confidence: 99%
“…When these collections consist entirely of pairs of mutually reverse complementary DNA strands they are called DNA tag-antitag systems [4] and DNA codes [7]- [13].…”
Section: Introductionmentioning
confidence: 99%
“…Later attempts included deletion similarity [8], which was earlier introduced by Levenshtein [17] and block similarity [12]- [13]. Both functions are non-additive which allowed for consideration of such cases as shifts of DNA sequences along each other.…”
Abstract-We consider DNA codes based on the nearestneighbor (stem) similarity model which adequately reflects the "hybridization potential" of two DNA sequences. Our aim is to present a survey of bounds on the rate of DNA codes with respect to a thermodynamically motivated similarity measure called an additive stem similarity. These results yield a method to analyze and compare known samples of the nearest neighbor "thermodynamic weights" associated to stacked pairs that occurred in DNA secondary structures.
“…Identity (5) implies the symmetry property of hybridization energy between DNA sequences x and y [7]- [13]. Example 1: In [18] we considered constant weights w = w(a, b) ≡ 1, a, b ∈ A, for which the additive stem 1-similarity S 1 (x, y), 0 ≤ S 1 (x, y) ≤ S 1 (x, x) = n − 1, is the abovementioned number of stems in the longest common Hamming subsequence between x and y.…”
Section: A Notations and Definitionsmentioning
confidence: 99%
“…. x n ) are identically distributed in accordance with the Markov chain (13) and, in virtue of (11), the corresponding reverse complement codewords x = (x nxn−1 . .…”
Section: Bounds On Rate R W (D)mentioning
confidence: 99%
“…When these collections consist entirely of pairs of mutually reverse complementary DNA strands they are called DNA tag-antitag systems [4] and DNA codes [7]- [13].…”
Section: Introductionmentioning
confidence: 99%
“…Later attempts included deletion similarity [8], which was earlier introduced by Levenshtein [17] and block similarity [12]- [13]. Both functions are non-additive which allowed for consideration of such cases as shifts of DNA sequences along each other.…”
Abstract-We consider DNA codes based on the nearestneighbor (stem) similarity model which adequately reflects the "hybridization potential" of two DNA sequences. Our aim is to present a survey of bounds on the rate of DNA codes with respect to a thermodynamically motivated similarity measure called an additive stem similarity. These results yield a method to analyze and compare known samples of the nearest neighbor "thermodynamic weights" associated to stacked pairs that occurred in DNA secondary structures.
“…When these collections consist entirely of pairs of mutually reverse complementary DNA strands they are called DNA tag-antitag systems [3] and DNA codes [5]- [9].…”
We consider DNA codes based on the nearestneighbor (stem) similarity model which adequately reflects the "hybridization potential" of two DNA sequences. Our first aim is to discuss some optimal constructions of linear DNA codes called maximum distance separable (MDS) codes for stem distance. These constructions are compared with conventional MDS codes for Hamming distance. Our second aim is to present a survey of bounds on the rate of DNA codes with respect to a thermodynamically motivated similarity measure called an additive stem similarity. The given bounds yield a method to analyze and compare known samples of the nearest neighbor "thermodynamic weights" associated to stacked pairs that occurred in DNA secondary structures.
We were motivated by three novel technologies, which exemplify a new design paradigm in high throughput genomics: nanostring TM, DNA‐mediated Annealing, Selection, extension, and Ligation DASL TM, and multiplex real‐time quantitative polymerase chain reaction (QPCR). All three are solution hybridization based, and all three employ on 10–1000 DNA sequence probes in a small volume, each probe specific for a particular sequence in a different human gene. nanostring TM uses 50‐mer, DASL and multiplex QPCR use ∼20‐mer probes. Assuming a 1‐nM probe concentration in a 1 μL volume, there are 10− 9 × 10− 9 × 6.23 × 1023 or 6.23 × 105 molecules of each probe present in the reaction compared to 10–1000 target molecules. Excess probe drives the sensitivity of the reaction. We are interested in the limits of multiplexing, i.e., the probability that in such a design a particular probe would bind to any other, sequence‐related probe rather than the intended, specific target. If this were to happen with appreciable frequency, this would result in much reduced sensitivity and potential failure of this design. We established upper and lower bounds for the probability that in a multiplex assay at least one probe would bind to another sequence‐related probe rather than its cognate target. These bounds are reassuring, because for reasonable degrees of multiplexing (103 probes) the probability for such an event is practically negligible. As the degree of multiplexing increases to ∼106 probes, our theoretical boundaries gain practical importance and establish a principal upper limit for the use of highly multiplexed solution‐based assays vis‐‐a‐vis solid‐support anchored designs. WIREs Comput Stat 2015, 7:394–399. doi: 10.1002/wics.1364
This article is categorized under:
Applications of Computational Statistics > Genomics/Proteomics/Genetics
Data: Types and Structure > Microarrays
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.