Stefan Ivanovic scite author profile

El-Kebir

2022

Preprint

Cancer results from an evolutionary process that typically yields multiple clones with varying sets of mutations within the same tumor. Here, we introduce CluMu (Clone To Mutation), a flexible and low-parameter tree-generative model of cancer evolution. CloMu uses a two-layer neural network trained via reinforcement learning to determine the probability of new mutations based on the existing mutations on a clone. CloMu supports several prediction tasks, including the determination of evolutionary trajectories, tree selection and prioritization, causality and interchangeability between mutations, and mutation fitness. Importantly, previous methods support only some of these tasks, and many suffer from overfitting on datasets with a large number of mutations. Using simulations, we demonstrate that CloMu either matches or outperforms current methods on a wide variety of prediction tasks. In particular, for simulated data with interchangeable mutations, current methods are unable to uncover causal relationships as effectively as CloMu. On breast cancer and leukemia cohorts, we show that CloMu determines similarities and causal relationships between mutations as well as the fitness of mutations. We validate CloMu's inferred mutation fitness values for the leukemia cohort by comparing them to temporal clonal proportion data not used during training, showing high concordance. In summary, CloMu's low-parameter model facilitates a wide range of prediction tasks regarding cancer evolution on increasingly available cohort-level datasets.

A Filtering-Based Approach for Improving Crowdsourced GNSS Traces in a Data Update Context

Ivanovic¹,

Olteanu‐Raimond²,

Mustière³

et al. 2019

IJGI

Traces collected by citizens using GNSS (Global Navigation Satellite System) devices during sports activities such as running, hiking or biking are now widely available through different sport-oriented collaborative websites. The traces are collected by citizens for their own purposes and frequently shared with the sports community on the internet. Our research assumption is that crowdsourced GNSS traces may be a valuable source of information to detect updates in authoritative datasets. Despite their availability, the traces present some issues such as poor metadata, attribute incompleteness and heterogeneous positional accuracy. Moreover, certain parts of the traces (GNSS points composing the traces) are results of the displacements made out of the existing paths. In our context (i.e., update authoritative data) these off path GNSS points are considered as noise and should be filtered. Two types of noise are examined in this research: Points representing secondary activities (e.g., having a lunch break) and points representing errors during the acquisition. The first ones we named secondary human behaviour (SHB), whereas we named the second ones outliers. The goal of this paper is to improve the smoothness of traces by detecting and filtering both SHB and outliers. Two methods are proposed. The first one allows for the detection secondary human behaviour by analysing only traces geometry. The second one is a rule-based machine learning method that detects outliers by taking into account the intrinsic characteristics of points composing the traces, as well as the environmental conditions during traces acquisition. The proposed approaches are tested on crowdsourced GNSS traces collected in mountain areas during sports activities.

UPP2: fast and accurate alignment of datasets with fragmentary sequences

Chu

Shen

et al. 2023

Motivation Multiple sequence alignment (MSA) is a basic step in many bioinformatics pipelines. However, achieving highly accurate alignments on large datasets, especially those with sequence length heterogeneity, is a challenging task. UPP (Ultra-large multiple sequence alignment using Phylogeny-aware Profiles) is a method for MSA estimation that builds an ensemble of Hidden Markov Models (eHMM) to represent an estimated alignment on the full length sequences in the input, and then adds the remaining sequences into the alignment using selected HMMs in the ensemble. Although UPP provides good accuracy, it is computationally intensive on large datasets. Results We present UPP2, a direct improvement on UPP. The main advance is a fast technique for selecting HMMs in the ensemble that allows us to achieve the same accuracy as UPP but with greatly reduced runtime. We show that UPP2 produces more accurate alignments compared to leading MSA methods on datasets exhibiting substantial sequence length heterogeneity, and is among the most accurate otherwise. Availability https://github.com/gillichu/sepp Supplementary information Supplementary information are available online at Bioinformatics

Potential of Crowdsourced Traces for Detecting Updates in Authoritative Geographic Data

Olteanu‐Raimond

Mustière

et al. 2019

UPP2: Fast and Accurate Alignment Estimation of Datasets with Fragmentary Sequences

Chu

Shen

et al. 2022

Preprint

Motivation: Multiple sequence alignment (MSA) is a basic step in many bioinformatics pipelines. However, achieving highly accurate alignments on large datasets, especially those with sequence length heterogeneity, is a challenging task. UPP (Ultra-large multiple sequence alignment using Phylogeny-aware Pro les) is a method for MSA estimation that builds an ensemble of Hidden Markov Models (eHMM) to represent an estimated alignment on the full length sequences in the input, and then adds the remaining sequences into the alignment using selected HMMs in the ensemble. Although UPP provides good accuracy, it is computationally intensive on large datasets. Results: We present UPP2, a direct improvement on UPP. The main advance is a fast technique for selecting HMMs in the ensemble that allows us to achieve the same accuracy as UPP but with greatly reduced runtime. We show UPP2 produces more accurate alignments compared to leading MSA methods on datasets exhibiting substantial sequence length heterogeneity, and is among the most accurate otherwise. Availability: https://github.com/gillichu/sepp Contact: warnow@illinois.edu