Paola Bonizzoni scite author profile

The perfect phylogeny is one of the most used models in different areas of computational biology. In this paper we consider the problem of the Persistent Perfect Phylogeny (referred as P-PP) recently introduced to extend the perfect phylogeny model allowing persistent characters, that is characters can be gained and lost at most once. We define a natural generalization of the P-PP problem obtained by requiring that for some pairs (character, species), neither the species nor any of its ancestors can have the character. In other words, some characters cannot be persistent for some species. This new problem is called Constrained P-PP (CP-PP). Based on a graph formulation of the CP-PP problem, we are able to provide a polynomial time solution for the CP-PP problem for matrices having an empty conflict-graph. In particular we show that all such matrices admit a persistent perfect phylogeny in the unconstrained case. Using this result, we develop a parameterized algorithm for solving the CP-PP problem where the parameter is the number of characters. A preliminary experimental analysis of the algorithm shows that it performs efficiently and it may analyze real haplotype data not conforming to the classical perfect phylogeny model.

show abstract

HapCol: accurate and memory-efficient haplotype assembly from long reads

Pirola

Zaccaria

Dondi

et al. 2015

View full text Add to dashboard Cite

Motivation: Haplotype assembly is the computational problem of reconstructing haplotypes in diploid organisms and is of fundamental importance for characterizing the effects of single-nucleotide polymorphisms on the expression of phenotypic traits. Haplotype assembly highly benefits from the advent of 'future-generation' sequencing technologies and their capability to produce long reads at increasing coverage. Existing methods are not able to deal with such data in a fully satisfactory way, either because accuracy or performances degrade as read length and sequencing coverage increase or because they are based on restrictive assumptions. Results: By exploiting a feature of future-generation technologies-the uniform distribution of sequencing errors-we designed an exact algorithm, called HAPCOL, that is exponential in the maximum number of corrections for each single-nucleotide polymorphism position and that minimizes the overall error-correction score. We performed an experimental analysis, comparing HAPCOL with the current state-of-the-art combinatorial methods both on real and simulated data. On a standard benchmark of real data, we show that HAPCOL is competitive with state-of-the-art methods, improving the accuracy and the number of phased positions. Furthermore, experiments on realistically simulated datasets revealed that HAPCOL requires significantly less computing resources, especially memory. Thanks to its computational efficiency, HAPCOL can overcome the limits of previous approaches, allowing to phase datasets with higher coverage and without the traditional all-heterozygous assumption.

show abstract

The complexity of multiple sequence alignment with SP-score that is a metric

Bonizzoni

Vedova

2001

Theoretical Computer Science

View full text Add to dashboard Cite

The structure of reflexive regular splicing languages via Schützenberger constants

Bonizzoni

Felice

Zizza

2005

Theoretical Computer Science

View full text Add to dashboard Cite

The splicing operation was introduced in 1987 by Head as a mathematical model of the recombination of DNA molecules under the influence of restriction and ligases enzymes. This operation allows us to define a computing (language generating) device, called a splicing system. Other variants of this original definition were also proposed by Paun and Pixton respectively. The computational power of splicing systems has been thoroughly investigated. Nevertheless, an interesting problem is still open, namely the characterization of the class of regular languages generated by finite splicing systems. In this paper, we will solve the problem for a special class of finite splicing systems, termed reflexive splicing systems, according to each of the definitions of splicing given by Paun and Pixton. This special class of systems contains, in perticular, finite Head splicing systems. The notion of a constant, given by Schützenberger, once again intervenes.

show abstract

ASPicDB: A database resource for alternative splicing analysis

Castrignanò

D’Antonio

Anselmo

et al. 2008

View full text Add to dashboard Cite

show abstract

Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses

Ciccolella

Ricketts

Gomez

et al. 2020

View full text Add to dashboard Cite

Motivation In recent years, the well-known Infinite Sites Assumption (ISA) has been a fundamental feature of computational methods devised for reconstructing tumor phylogenies and inferring cancer progressions. However, recent studies leveraging Single-Cell Sequencing (SCS) techniques have shown evidence of the widespread recurrence and, especially, loss of mutations in several tumor samples. While there exist established computational methods that infer phylogenies with mutation losses, there remain some advancements to be made. Results We present SASC (Simulated Annealing Single-Cell inference): a new and robust approach based on simulated annealing for the inference of cancer progression from SCS data sets. In particular, we introduce an extension of the model of evolution where mutations are only accumulated, by allowing also a limited amount of mutation loss in the evolutionary history of the tumor: the Dollo-k model. We demonstrate that SASC achieves high levels of accuracy when tested on both simulated and real data sets and in comparison with some other available methods. Availability The Simulated Annealing Single-Cell inference (SASC) tool is open source and available at https://github.com/sciccolella/sasc. Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes

Bonizzoni

Dondi

Klau

et al. 2016

Journal of Computational Biology

View full text Add to dashboard Cite

In diploid genomes, haplotype assembly is the computational problem of reconstructing the two parental copies, called haplotypes, of each chromosome starting from sequencing reads, called fragments, possibly affected by sequencing errors. Minimum error correction (MEC) is a prominent computational problem for haplotype assembly and, given a set of fragments, aims at reconstructing the two haplotypes by applying the minimum number of base corrections. MEC is computationally hard to solve, but some approximation-based or fixed-parameter approaches have been proved capable of obtaining accurate results on real data. In this work, we expand the current characterization of the computational complexity of MEC from the approximation and the fixed-parameter tractability point of view. In particular, we show that MEC is not approximable within a constant factor, whereas it is approximable within a logarithmic factor in the size of the input. Furthermore, we answer open questions on the fixed-parameter tractability for parameters of classical or practical interest: the total number of corrections and the fragment length. In addition, we present a direct 2-approximation algorithm for a variant of the problem that has also been applied in the framework of clustering data. Finally, since polyploid genomes, such as those of plants and fishes, are composed of more than two copies of the chromosomes, we introduce a novel formulation of MEC, namely the k-ploid MEC problem, that extends the traditional problem to deal with polyploid genomes. We show that the novel formulation is still both computationally hard and hard to approximate. Nonetheless, from the parameterized point of view, we prove that the problem is tractable for parameters of practical interest such as the number of haplotypes and the coverage, or the number of haplotypes and the fragment length.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paola Bonizzoni

The Haplotyping problem: An overview of computational models and solutions

The binary perfect phylogeny with persistent characters

HapCol: accurate and memory-efficient haplotype assembly from long reads

The complexity of multiple sequence alignment with SP-score that is a metric

The structure of reflexive regular splicing languages via Schützenberger constants

ASPicDB: A database resource for alternative splicing analysis

Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses

On the Minimum Error Correction Problem for Haplotype Assembly in Diploid and Polyploid Genomes

Contact Info

Product

Resources

About