The accurate recovery of evolutionary trees is a major problem in mathematical inference. The relevant theory comes from mathematics (graph theory, combinatorics, and Markov chains), statistics (likelihood, Bayesian methods, resampling, and exploratory data analysis), operations research (optimisation, and search heuristics), and computer science (complexity theory). Different ways of presenting molecular data are described and a consistent terminology is presented. The problem of the combinatorial explosion in the numbers of potential trees leads to a description of the complexity of algorithms. The use of simple Markov models for studying the evolution of DNA sequences is described. Methods for inferring trees are analysed under the three components of, an optimality criterion, searching tree space, and assumptions or knowledge about the mechanism of evolution (which can allow compensation for multiple (unobserved) changes at a site). The analysis of methods in this chapter concentrates on parsimony and distance methods for inferring trees, but also discusses the limited range of cases when maximum parsimony and maximum likelihood are equivalent. A section describes the use of networks to represent the complexities in the data or to summarise the variability of the output trees. An overview of tree-building methods includes several strategies for searching tree space, from complete searches, branch-and-bound searches, to different forms of heuristic approaches.
INTRODUCTIONThe mathematics behind phylogenetic programs is interesting in that it comes from several areas of pure and applied mathematics, both discrete and continuous. These include graph theory, combinatorics, and Markov chains from mathematics, together with statistics (likelihood, Bayesian analysis, resampling), operations research (optimisation, search heuristics), and computer science (complexity theory). Fortunately, the underlying concepts are relatively simple (even if the mathematics is complex) and in this chapter we only consider the concepts. Our description of the mathematics is informal, more formal 489 Handbook of Statistical Genetics, Third Edition . E dited by D . J. Balding, M . Bishop and C. Cannings.