2020
DOI: 10.3150/19-bej1181
|View full text |Cite
|
Sign up to set email alerts
|

Matching strings in encoded sequences

Abstract: We investigate the length of the longest common substring for encoded sequences and its asymptotic behaviour. The main result is a strong law of large numbers for a re-scaled version of this quantity, which presents an explicit relation with the Rényi entropy of the source. We apply this result to the zero-inflated contamination model and the stochastic scrabble. In the case of dynamical systems, this problem is equivalent to the shortest distance between two observed orbits and its limiting relationship with … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 42 publications
0
5
0
Order By: Relevance
“…We could also consider, as in the first section of the paper, a non-trivial f . In the context of sequence matching, f is called the encoding function (or encoder) and can model different treatments of the original source of information [14]. The clustering structure is however in this case too complex to yield such a general result.…”
Section: Definitionmentioning
confidence: 99%
See 1 more Smart Citation
“…We could also consider, as in the first section of the paper, a non-trivial f . In the context of sequence matching, f is called the encoding function (or encoder) and can model different treatments of the original source of information [14]. The clustering structure is however in this case too complex to yield such a general result.…”
Section: Definitionmentioning
confidence: 99%
“…Closely related problems have gained interest in recent years. The case of real, unobserved trajectories was considered in [19] and [11], using EVT techniques, while asymptotic results for the shortest distance between two orbits were obtained in [7] and then generalized to multiple orbits [8] and finally to observed orbits [14].…”
Section: Introductionmentioning
confidence: 99%
“…If the process is α-mixing with an exponential decay (or ψ-mixing with polynomial decay) and if for cylinders C n of length n in B n , their preimage f −1 C n is of length at most h(n) with h(n) = o(n γ ) for some γ > 0, one could use the ideas of Coutinho et al (2020) to get a lower bound. Nevertheless, the run-length encoder does not satisfy this last necessary assumption since preimage of cylinders under f can have arbitrary length.…”
Section: Longest Common Substring For Rle Sequencesmentioning
confidence: 99%
“…In Coutinho et al (2020), the authors wondered if the above mentioned result holds if the sequences are transformed following certain rules of modification. Thus, if f is a measurable function (called an encoder) transforming a sequence x into another sequence f (x), they studied the behaviour of M n (f (x), f (y)) and obtain a relation with the Rényi entropy of the pushforward measure f * µ.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation