Stability of Pt/γ-Al<sub>2</sub>O<sub>3</sub> Catalysts in Lignin and Lignin Model Compound Solutions under Liquid Phase Reforming Reaction Conditions

Background We study a preprocessing routine relevant in pan-genomic analyses: consider a set of aligned haplotype sequences of complete human chromosomes. Due to the enormous size of such data, one would like to represent this input set with a few founder sequences that retain as well as possible the contiguities of the original sequences. Such a smaller set gives a scalable way to exploit pan-genomic information in further analyses (e.g. read alignment and variant calling). Optimizing the founder set is an NP-hard problem, but there is a segmentation formulation that can be solved in polynomial time, defined as follows. Given a threshold L and a set of m strings (haplotype sequences), each having length n , the minimum segmentation problem for founder reconstruction is to partition [1, n ] into set P of disjoint segments such that each segment has length at least L and the number of distinct substrings at segment [ a , b ] is minimized over . The distinct substrings in the segments represent founder blocks that can be concatenated to form founder sequences representing the original such that crossovers happen only at segment boundaries. Results We give an O ( mn ) time (i.e. linear time in the input size) algorithm to solve the minimum segmentation problem for founder reconstruction, improving over an earlier . Conclusions Our improvement enables to apply the formulation on an input of thousands of complete human chromosomes. We implemented the new algorithm and give experimental evidence on its practicality. The implementation is available in https://github.com/tsnorri/founder-sequences .

show abstract

Pal k is Linear Recognizable Online

Kosolobov¹,

Rubinchik²,

Shur³

2015

View full text Add to dashboard Cite

show abstract

LZ-End Parsing in Compressed Space

Kempa

Kosolobov

2017

View full text Add to dashboard Cite

We present an algorithm that constructs the LZ-End parsing (a variation of LZ77) of a given string of length n in O(n log ℓ) expected time and O(z + ℓ) space, where z is the number of phrases in the parsing and ℓ is the length of the longest phrase. As an option, we can fix ℓ (e.g., to the size of RAM) thus obtaining a reasonable LZ-End approximation with the same functionality and the length of phrases restricted by ℓ. This modified algorithm constructs the parsing in streaming fashion in one left to right pass on the input string w.h.p. and performs one right to left pass to verify the correctness of the result. Experimentally comparing this version to other LZ77-based analogs, we show that it is of practical interest.

show abstract

Linear Time Maximum Segmentation Problems in Column Stream Model

Cazaux

Kosolobov

Mäkinen

et al. 2019

View full text Add to dashboard Cite

We study a lossy compression scheme linked to the biological problem of founder reconstruction: The goal in founder reconstruction is to replace a set of strings with a smaller set of founders such that the original connections are maintained as well as possible. A general formulation of this problem is NP-hard, but when limiting to reconstructions that form a segmentation of the input strings, polynomial time solutions exist. We proposed in our earlier work (WABI 2018) a linear time solution to a formulation where minimum segment length was bounded, but it was left open if the same running time can be obtained when the targeted compression level (number of founders) is bounded and lossyness is minimized. This optimization is captured by the Maximum Segmentation problem: Given a threshold M and a set R = {R1, . . . , Rm} of strings of the same length n, find a minimum cost partition P where for each segment [i, j] ∈ P , the compression level |{R k [i, j] : 1 ≤ k ≤ m}| is bounded from above by M . We give linear time algorithms to solve the problem for two different (compression quality) measures on P : the average length of the intervals of the partition and the length of the minimal interval of the partition. These algorithms make use of positional Burrows-Wheeler transform and the range maximum queue, an extension of range maximum queries to the case where the input string can be operated as a queue. For the latter, we present a new solution that may be of independent interest. The solutions work in a streaming model where one column of the input strings is introduced at a time.

show abstract

Palindromic Length in Linear Time

Borozdin

Kosolobov²,

Rubinchik

et al. 2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dmitry Kosolobov

Linear time minimum segmentation enables scalable founder reconstruction

Pal k is Linear Recognizable Online

LZ-End Parsing in Compressed Space

Linear Time Maximum Segmentation Problems in Column Stream Model

Palindromic Length in Linear Time

Contact Info

Product

Resources

About