We present a new class of binary words: the prefix normal words. They are defined by the property that for any given length k, no factor of length k has more a's than the prefix of the same length. These words arise in the context of indexing for jumbled pattern matching (a.k.a. permutation matching or Parikh vector matching), where the aim is to decide whether a string has a factor with a given multiplicity of characters, i.e., with a given Parikh vector. Using prefix normal words, we give the first non-trivial characterization of binary words having the same set of Parikh vectors of factors. We prove that the language of prefix normal words is not context-free and is strictly contained in the language of pre-necklaces, which are prefixes of powers of Lyndon words. We discuss further properties and state open problems.
In combinatorics of words, a concatenation of k consecutive equal blocks is called a power of order k. In this paper we take a different point of view and define an anti-power of order k as a concatenation of k consecutive pairwise distinct blocks of the same length. As a main result, we show that every infinite word contains powers of any order or anti-powers of any order. That is, the existence of powers or anti-powers is an unavoidable regularity. Indeed, we prove a stronger result, which relates the density of anti-powers to the existence of a factor that occurs with arbitrary exponent. As a consequence, we show that in every aperiodic uniformly recurrent word, anti-powers of every order begin at every position. We further show that every infinite word avoiding anti-powers of order 3 is ultimately periodic, while there exist aperiodic words avoiding anti-powers of order 4. We also show that there exist aperiodic recurrent words avoiding anti-powers of order 6.
The Parikh vector p(s) of a string s over a finite ordered alphabet Σ = {a 1 , . . . , aσ} is defined as the vector of multiplicities of the characters, p(s) = (p 1 , . . . , pσ), where p i = |{j | s j = a i }|. Parikh vector q occurs in s if s has a substring t with p(t) = q. The problem of searching for a query q in a text s of length n can be solved simply and worst-case optimally with a sliding window approach in O(n) time. We present two novel algorithms for the case where the text is fixed and many queries arrive over time.The first algorithm only decides whether a given Parikh vector appears in a binary text. It uses a linear size data structure and decides each query in O(1) time. The preprocessing can be done trivially in Θ(n 2 ) time.The second algorithm finds all occurrences of a given Parikh vector in a text over an arbitrary alphabet of size σ ≥ 2 and has sub-linear expected time complexity. More precisely, we present two variants of the algorithm, both using an O(n) size data structure, each of which can be constructed in O(n) time. The first solution is very simple and easy to implement and leads to an expected query time of O(n( σ log σ ) 1/2 log m √ m ), where m = i q i is the length of a string with Parikh vector q. The second uses wavelet trees and improves the expected runtime to O(n( σ log σ ) 1/2 1 √ m ), i.e., by a factor of log m. 357 Int. J. Found. Comput. Sci. 2012.23:357-374. Downloaded from www.worldscientific.com by MCMASTER UNIVERSITY on 02/20/15. For personal use only. 358 P. Burcsi et al.Notice that this is an overestimate, since line 7 is only executed if no occurrence was found after the current update of R (line 4). Standard algebraic manipulations using Jensen's inequality (see, e.g. [16]) yield J i=1 log(R i −R i−1 + m) ≤ J log n J + m . Therefore we obtain
We give an O(n log n)-time, O(n)-space algorithm for factoring a string into the minimum number of palindromic substrings. That is, given a string S[1..n], in O(n log n) time our algorithm returns the minimum number of palindromes S 1 , . . . , S ℓ such that S = S 1 · · · S ℓ . We also show that the time complexity is O(n) on average and Ω(n log n) in the worst case. The last result is based on a characterization of the palindromic structure of Zimin words.
Abstract-We give a polynomial-time construction of the set of sequences that satisfy a finite-memory constraint defined by a finite list of forbidden blocks, with a specified set of bit positions unconstrained. Such a construction can be used to build modulation/error-correction codes (ECC codes) like the ones defined by the Immink-Wijngaarden scheme in which certain bit positions are reserved for ECC parity. We give a lineartime construction of a finite-state presentation of a constrained system defined by a periodic list of forbidden blocks. These systems, called periodic-finite-type systems, were introduced by Moision and Siegel. Finally, we present a linear-time algorithm for constructing the minimal periodic forbidden blocks of a finite sequence for a given period.
A 1-prefix normal word is a binary word with the property that no factor has more 1s than the prefix of the same length; a 0-prefix normal word is defined analogously. These words arise in the context of indexed binary jumbled pattern matching, where the aim is to decide whether a word has a factor with a given number of 1s and 0s (a given Parikh vector). Each binary word has an associated set of Parikh vectors of the factors of the word. Using prefix normal words, we provide a characterization of the equivalence class of binary words having the same set of Parikh vectors of their factors.We prove that the language of prefix normal words is not context-free and is strictly contained in the language of pre-necklaces, which are prefixes of powers of Lyndon words. We give enumeration results on pnw(n), the number of prefix normal words of length n, showing that, for sufficiently large n,For fixed density (number of 1s), we show that the ordinary generating function of the number of prefix normal words of length n and density d is a rational function. Finally, we give experimental results on pnw(n), discuss further properties, and state open problems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.