BackgroundCophylogeny mapping is used to uncover deep coevolutionary associations between two or more phylogenetic histories at a macro coevolutionary scale. As cophylogeny mapping is NP-Hard, this technique relies heavily on heuristics to solve all but the most trivial cases. One notable approach utilises a metaheuristic to search only a subset of the exponential number of fixed node orderings possible for the phylogenetic histories in question. This is of particular interest as it is the only known heuristic that guarantees biologically feasible solutions. This has enabled research to focus on larger coevolutionary systems, such as coevolutionary associations between figs and their pollinator wasps, including over 200 taxa. Although able to converge on solutions for problem instances of this size, a reduction from the current cubic running time is required to handle larger systems, such as Wolbachia and their insect hosts.ResultsRather than solving this underlying problem optimally this work presents a greedy algorithm called TreeCollapse, which uses common topological patterns to recover an approximation of the coevolutionary history where the internal node ordering is fixed. This approach offers a significant speed-up compared to previous methods, running in linear time. This algorithm has been applied to over 100 well-known coevolutionary systems converging on Pareto optimal solutions in over 68% of test cases, even where in some cases the Pareto optimal solution has not previously been recoverable. Further, while TreeCollapse applies a local search technique, it can guarantee solutions are biologically feasible, making this the fastest method that can provide such a guarantee.ConclusionAs a result, we argue that the newly proposed algorithm is a valuable addition to the field of coevolutionary research. Not only does it offer a significantly faster method to estimate the cost of cophylogeny mappings but by using this approach, in conjunction with existing heuristics, it can assist in recovering a larger subset of the Pareto front than has previously been possible.
A popular method for coevolutionary inference is cophylogenetic reconstruction where the branch length of the phylogenies have been previously derived. This approach, unlike the more generalized reconstruction techniques that are NP-Hard, can reconcile the shared evolutionary history of a pair of phylogenetic trees in polynomial time. This approach, while proven to be highly successful, requires a high polynomial running time. This is quickly becoming a limiting factor of this approach due to the continual increase in size of coevolutionary data sets. One existing method that combats this issue proposes a trade-off of accuracy for an asymptotic time complexity reduction. This technique in almost 70% of cases converges on Pareto optimal solutions in linear time. We build on this prior work by proposing an alternate linear time algorithm (RASCAL) that offers a significant accuracy increase, with RASCAL converging on Pareto optimal solutions in 85% of cases and unlike prior methods can ensure, with high probability, that all optimal solutions can be recovered, provided sufficient replicates are performed.
Traditionally, studies of coevolving systems have considered cases where a parasite may inhabit only a single host. The case where a parasite may infect many hosts, widespread parasitism, has until recently gained little traction. This is due in part to the computational complexity involved in reconstructing the coevolutionary histories where parasites may infect only a single host, which is NP-Hard. Allowing parasites to inhabit more than one host has been seen to only further compound this computationally intractable problem. Recently however, well-established algorithms for estimating the problem instance where a parasite may infect only a single host have been extended to handle widespread parasites. Although this has offered significant progress, it has been noted that these algorithms poorly handle parasites that inhabit phylogenetically distant hosts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.