We present POY version 4, an open source program for the phylogenetic analysis of morphological, prealigned sequence, unaligned sequence, and genomic data. POY allows phylogenetic inference when not only substitutions, but insertions, deletions, and rearrangement events are allowed (computed using the breakpoint or inversion distance). Compared with previous versions, POY 4 provides greater flexibility, a larger number of supported parameter sets, numerous execution time improvements, a vastly improved user interface, greater quality control, and extensive documentation. We introduce POYÕs basic features, and present a simple example illustrating the performance improvements over previous versions of the application.
We present POY version 5, an open source program for the phylogenetic analysis of diverse data types including qualitative, aligned sequences, unaligned sequences, genomic data, and user‐defined sequences. In addition to the maximum‐parsimony optimality criterion supported by POY4, POY5 supports several types of maximum likelihood as well as posterior probability. To make these analyses feasible, new heuristic search algorithms and parallelization options have been implemented for all criteria.
BackgroundThe inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred.ResultsFor large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement.ConclusionsOur results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and optimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.
Novel pathogens have the potential to become critical issues of national security, public health and economic welfare. As demonstrated by the response to Severe Acute Respiratory Syndrome (SARS) and influenza, genomic sequencing has become an important method for diagnosing agents of infectious disease. Despite the value of genomic sequences in characterizing novel pathogens, raw data on their own do not provide the information needed by public health officials and researchers. One must integrate knowledge of the genomes of pathogens with host biology and geography to understand the etiology of epidemics. To these ends, we have created an application called Supramap (http://supramap.osu.edu) to put information on the spread of pathogens and key mutations across time, space and various hosts into a geographic information system (GIS). To build this application, we created a web service for integrated sequence alignment and phylogenetic analysis as well as methods to describe the tree, mutations, and host shifts in Keyhole Markup Language (KML). We apply the application to 239 sequences of the polymerase basic 2 (PB2) gene of recent isolates of avian influenza (H5N1). We map a mutation, glutamic acid to lysine at position 627 in the PB2 protein (E627K), in H5N1 influenza that allows for increased replication of the virus in mammals. We use a statistical test to support the hypothesis of a correlation of E627K mutations with avian-mammalian host shifts but reject the hypothesis that lineages with E627K are moving westward. Data, instructions for use, and visualizations are included as supplemental materials at: http://supramap.osu.edu/sm/supramap/publications. Ó The Willi Hennig Society 2010.We have created a web-based workflow application, Supramap (http://supramap.osu.edu). Using a web browser, a user inputs text files containing sequence and or phenotypic data, latitude and longitude coordinates, and (optionally) a date of isolation for each strain. Our application then executes a workflow that entails integrated sequence alignment and phylogenetic analysis, computation of character changes (e.g., mutations and host shifts), and geographical projection of the tree on a computing cluster. Once the analyses are complete, the user can download a phylogenetic layer expressed in KML file and view the file with a Geographic Information System (GIS). The user can use the phylogenetic layer to visualize several aspects of pathogen evolution including: spread of lineages, mutations, shifts among hosts, and phenotypic changes over geography and time. We illustrate the use of the system with a case study on H5N1 and discuss use of visualization in conjunction with statistical validation. Other tree projection effortsSupramap is superficially similar to other efforts for projecting phylogenetic trees in GIS, such as
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.