2000
DOI: 10.1103/physrevlett.85.1342
|View full text |Cite
|
Sign up to set email alerts
|

Finding Borders between Coding and Noncoding DNA Regions by an Entropic Segmentation Method

Abstract: We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
111
0

Year Published

2001
2001
2020
2020

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 119 publications
(111 citation statements)
references
References 20 publications
0
111
0
Order By: Relevance
“…The number of parameters in the two models are K2 = 7 (the partition point i is also a free parameter) and K1 = 3. So 2NDJS under the null hypothesis should obey the χ 2 df =4 distribution (the same conclusion was reached before, see [6] and (I Grosse, et al in preparation), only the df used there is 3, instead 4).…”
Section: The Divide-and-conquer Segmentation As a Likelihood Ratio Testmentioning
confidence: 73%
“…The number of parameters in the two models are K2 = 7 (the partition point i is also a free parameter) and K1 = 3. So 2NDJS under the null hypothesis should obey the χ 2 df =4 distribution (the same conclusion was reached before, see [6] and (I Grosse, et al in preparation), only the df used there is 3, instead 4).…”
Section: The Divide-and-conquer Segmentation As a Likelihood Ratio Testmentioning
confidence: 73%
“…The most commonly used procedure [4,5,6] is based on maximization of the Jensen-Shannon (J-S) divergence through which a given DNA string is recursively separated into compostionallly homogeneous segments called domains (or patches). This results in a coarse-grained description of the DNA string as a sequence of distinct domains.…”
Section: Introductionmentioning
confidence: 99%
“…A domain set may thus be interpreted as a larger homogeneous sequence, parts of which are scattered nonuniformly in a genomic sequence. The number of domain sets constructed thus is found to be much fewer than the domains obtained upon segmentation [4,5,6,7]. We propose here an optimal procedure, starting from the domains found from one of the above segmentation methods, and building up a domain set by adding together all its components.…”
Section: Introductionmentioning
confidence: 99%
“…This type of finite size effects is of major relevance for statistical analysis of DNA and other biosequences, e.g. [4,5,6,7]. In this article we want to calculate the expected frequency distribution which one finds in dependence on the sample size M .…”
Section: Introductionmentioning
confidence: 99%