Julia Haag scite author profile

Höhler

Bettisworth

et al. 2022

Phylogenetic analyses under the Maximum Likelihood model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. At present, no method exists to quantify and predict this behavior. We introduce a method to quantify the degree of difficulty for analyzing a dataset and present Pythia, a Random Forest Regressor that accurately predicts this difficulty. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating Maximum Likelihood based tree inferences. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyses, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard to-analyze datasets.

From Easy to Hopeless - Predicting the Difficulty of Phylogenetic Analyses

Höhler

Bettisworth

et al. 2022

Preprint

A representative Performance Assessment of Maximum Likelihood based Phylogenetic Inference Tools

Hoehler

Kozlov

et al. 2022

Preprint

The evaluation of phylogenetic inference tools is commonly conducted on simulated and empirical sequence data alignments. An open question is how representative these alignments are with respect to those, commonly analyzed by users. Based upon the RAxMLGrove database, it is now possible to simulate DNA sequences based on more than 70,000 representative RAxML and RAxML-NG tree inferences on empirical datasets conducted on the RAxML web servers. This allows to assess the phylogenetic tree inference accuracy of various inference tools based on realistic and representative simulated DNA alignments. We simulated 20,000 MSAs based on representative datasets (in terms of signal strength) from RAxMLGrove, and used 5,000 datasets from the TreeBASE database, to assess the inference accuracy of FastTree2, IQ-TREE2, and RAxML-NG. We find that on quantifiably difficult-to-analyze MSAs all of the analysed tools perform poorly, such that the quicker FastTree2, can constitute a viable alternative to infer trees. We also find, that there are substantial differences between accuracy results on simulated and empirical data, despite the fact that a substantial effort was undertaken to simulate sequences under as realistic as possible settings.

Simulations of sequence evolution: how (un)realistic they are and why

Trost

Höhler

et al. 2023

Preprint

Motivation: Simulating sequence evolution plays an important role in the development and evaluation of phylogenetic inference tools. Naturally, the simulated data needs to be as realistic as possible to be indicative of the performance of the developed tools on empirical data. Over the years, numerous phylogenetic sequence simulators, employing various models of evolution, have been published with the goal to simulate such empirical-like data. In this study, we simulated DNA and protein Multiple Sequence Alignments (MSAs) under increasingly complex models of evolution with and without insertion/deletion (indel) events using a state-of-the-art sequence simulator. We assessed their realism by quantifying how well supervised learning methods are able to predict whether a given MSA is simulated or empirical. Results: Our results show that we can distinguish between empirical and simulated MSAs with high accuracy using two distinct and independently developed classification approaches across all tested models of sequence evolution. Our findings suggest that the current state-of-the-art models fail to accurately replicate the process of evolution.

The Free Lunch is not over yet – Systematic Exploration of Numerical Thresholds in Phylogenetic Inference

Hübner

Kozlov

et al. 2022

Preprint

Maximum Likelihood (ML) is a widely used model for inferring phylogenies. The respective ML implementations heavily rely on numerical optimization routines that use internal numerical thresholds to determine convergence. We systematically analyze the impact of these threshold settings on the log-likelihood (LnL scores) and runtimes for ML tree inferences with RAxML-NG, IQ-TREE, and FastTree on empirical datasets. We provide empirical evidence that we can substantially accelerate tree inferences with RAxML-NG and IQ-TREE by changing the default values of two such numerical thresholds. At the same time, changing these settings does not significantly influence the quality of the inferred trees according to statistical significance tests. For RAxML-NG, increasing two likelihood thresholds results in an average speedup of 1.9 ± 0.6 on Data collection 1 and 1.8 ± 1.3 on Data collection 2. Increasing one likelihood threshold in IQ-TREE results in an average speedup of 1.3 ± 0.4 on Data collection 1 and 1.3 ± 0.9 on Data collection 2.

Adaptive RAxML-NG: Accelerating Phylogenetic inference under Maximum Likelihood using dataset difficulty

Togkousidis

Kozlov

et al. 2023

Preprint

Motivation: Phylogenetic inferences under the Maximum-Likelihood (ML) criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the difficulty of an MSA with respect to phylogenetic inference. Easy MSAs exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. However, as difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. Results: To this end, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, over-analyzing difficult datasets is hopeless and, thus, it suffices to quickly infer only one of the numerous almost equally likely topologies, to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10x. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).

Continuous Feature-Based Tracking of the Inner Ear for Robot-Assisted Microsurgery

et al. 2021

Robotic systems for surgery of the inner ear must enable highly precise movement in relation to the patient. To allow for a suitable collaboration between surgeon and robot, these systems should not interrupt the surgical workflow and integrate well in existing processes. As the surgical microscope is a standard tool, present in almost every microsurgical intervention and due to it being in close proximity to the situs, it is predestined to be extended by assistive robotic systems. For instance, a microscope-mounted laser for ablation. As both, patient and microscope are subject to movements during surgery, a well-integrated robotic system must be able to comply with these movements. To solve the problem of on-line registration of an assistance system to the situs, the standard of care often utilizes marker-based technologies, which require markers being rigidly attached to the patient. This not only requires time for preparation but also increases invasiveness of the procedure and the line of sight of the tracking system may not be obstructed. This work aims at utilizing the existing imaging system for detection of relative movements between the surgical microscope and the patient. The resulting data allows for maintaining registration. Hereby, no artificial markers or landmarks are considered but an approach for feature-based tracking with respect to the surgical environment in otology is presented. The images for tracking are obtained by a two-dimensional RGB stream of a surgical microscope. Due to the bony structure of the surgical site, the recorded cochleostomy scene moves nearly rigidly. The goal of the tracking algorithm is to estimate motion only from the given image stream. After preprocessing, features are detected in two subsequent images and their affine transformation is computed by a random sample consensus (RANSAC) algorithm. The proposed method can provide movement feedback with up to 93.2 μm precision without the need for any additional hardware in the operating room or attachment of fiducials to the situs. In long term tracking, an accumulative error occurs.

Feature Detection in Microscope Images for Assisting Systems in Microsurgery: First Results and a Clinical Perspective

Prinzen

Marzi

et al. 2022