2000
DOI: 10.1089/10665270050081360
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic and Statistical Properties of Words: An Overview

Abstract: In the following, an overview is given on statistical and probabilistic properties of words, as occurring in the analysis of biological sequences. Counts of occurrence, counts of clumps, and renewal counts are distinguished, and exact distributions as well as normal approximations, Poisson process approximations, and compound Poisson approximations are derived. Here, a sequence is modelled as a stationary ergodic Markov chain; a test for determining the appropriate order of the Markov chain is described. The c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
167
0

Year Published

2002
2002
2010
2010

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 221 publications
(168 citation statements)
references
References 59 publications
1
167
0
Order By: Relevance
“…This result generalizes the well known formula for the probability of a word occurrence in a Markovian sequence of letters, see [5].…”
Section: Introductionsupporting
confidence: 84%
See 1 more Smart Citation
“…This result generalizes the well known formula for the probability of a word occurrence in a Markovian sequence of letters, see [5].…”
Section: Introductionsupporting
confidence: 84%
“…Reinert et al [5] have called N (n) count of word w. In this section we shall derive IE(N (n)) and Var (N (n)) under certain conditions.…”
Section: The Number Of Overlapping Occurrencesmentioning
confidence: 99%
“…So these data sets are very diverse. We also tested these datasets using the Euclidean distance, because this distance is widely used in the time series data mining community [7,12].…”
Section: Empirical Evaluationmentioning
confidence: 99%
“…Statistical significance of over-representation of these word patterns provides valuable clues to biologists. Consequently, much work has been done on the use of asymptotic limiting distributions to approximate these pvalues (Prum et al, 1995;Reinert et al, 2000;Régnier, 2000;Robin et al, 2002;Huang et al, 2004;Leung et al, 2005;Mitrophanov and Borodovsky, 2006;Pape et al, 2008). However, the approximations may not be accurate for short words or for words consisting of repeats and most theoretical approximations work only in specific settings.…”
Section: Introduction Smentioning
confidence: 99%