Several indices that measure the degree of balance of a rooted phylogenetic tree have been proposed so far in the literature. In this work we define and study a new index of this kind, which we call the total cophenetic index: the sum, over all pairs of different leaves, of the depth of their lowest common ancestor. This index makes sense for arbitrary trees, can be computed in linear time and it has a larger range of values and a greater resolution power than other indices like Colless' or Sackin's. We compute its maximum and minimum values for arbitrary and binary trees, as well as exact formulas for its expected value for binary trees under the Yule and the uniform models of evolution. As a byproduct of this study, we obtain an exact formula for the expected value of the Sackin index under the uniform model, a result that seems to be new in the literature.
BackgroundPhylogenetic tree comparison metrics are an important tool in the study of evolution, and hence the definition of such metrics is an interesting problem in phylogenetics. In a paper in Taxon fifty years ago, Sokal and Rohlf proposed to measure quantitatively the difference between a pair of phylogenetic trees by first encoding them by means of their half-matrices of cophenetic values, and then comparing these matrices. This idea has been used several times since then to define dissimilarity measures between phylogenetic trees but, to our knowledge, no proper metric on weighted phylogenetic trees with nested taxa based on this idea has been formally defined and studied yet. Actually, the cophenetic values of pairs of different taxa alone are not enough to single out phylogenetic trees with weighted arcs or nested taxa.ResultsFor every (rooted) phylogenetic tree T, let its cophenetic vectorφ(T) consist of all pairs of cophenetic values between pairs of taxa in T and all depths of taxa in T. It turns out that these cophenetic vectors single out weighted phylogenetic trees with nested taxa. We then define a family of cophenetic metrics dφ,p by comparing these cophenetic vectors by means of Lp norms, and we study, either analytically or numerically, some of their basic properties: neighbors, diameter, distribution, and their rank correlation with each other and with other metrics.ConclusionsThe cophenetic metrics can be safely used on weighted phylogenetic trees with nested taxa and no restriction on degrees, and they can be computed in O(n2) time, where n stands for the number of taxa. The metrics dφ,1 and dφ,2 have positive skewed distributions, and they show a low rank correlation with the Robinson-Foulds metric and the nodal metrics, and a very high correlation with each other and with the splitted nodal metrics. The diameter of dφ,p, for p⩾1 , is in O(n(p+2)/p), and thus for low p they are more discriminative, having a wider range of values.
The Colless index is one of the most popular and natural balance indices for bifurcating phylogenetic trees, but it makes no sense for multifurcating trees. In this paper we propose a family of Colless-like balance indices that generalize the Colless index to multifurcating phylogenetic trees. Each is determined by the choice of a dissimilarity D and a weight function . A balance index is sound when the most balanced phylogenetic trees according to it are exactly the fully symmetric ones. Unfortunately, not every Colless-like balance index is sound in this sense. We prove then that taking f(n) = ln(n + e) or f(n) = en as weight functions, the resulting index is sound for every dissimilarity D. Next, for each one of these two functions f and for three popular dissimilarities D (the variance, the standard deviation, and the mean deviation from the median), we find the most unbalanced phylogenetic trees according to with any given number n of leaves. The results show that the growth pace of the function f influences the notion of “balance” measured by the indices it defines. Finally, we introduce our R package “CollessLike,” which, among other functionalities, allows the computation of Colless-like indices of trees and their comparison to their distribution under Chen-Ford-Winkel’s α-γ-model for multifurcating phylogenetic trees. As an application, we show that the trees in TreeBASE do not seem to follow either the uniform model for multifurcating trees or the α-γ-model, for any values of α and γ.
We define a new balance index for rooted phylogenetic trees based on the symmetry of the evolutive history of every set of 4 leaves. This index makes sense for multifurcating trees and it can be computed in time linear in the number of leaves. We determine its maximum and minimum values for arbitrary and bifurcating trees, and we provide exact formulas for its expected value and variance on bifurcating trees under Ford's α-model and Aldous' β-model and on arbitrary trees under the α-γ-model. of isomorphisms of the restriction of T to them (the rooted quartet they define), and then we add up these values over all 4-tuples of different leaves of T . The idea behind the definition of this balance index is that a highly symmetrical evolutive process should give rise to symmetrical evolutive histories of many small subsets of taxa. In terms of phylogenetic trees, this leads us to expect that, the most symmetrical a phylogenetic tree is, the most symmetrical will be its restrictions to subsets of leaves of a fixed cardinality. Since the smallest number of leaves yielding enough different tree topologies to allow a meaningful comparison of their symmetry is 4, we assess the balance of a tree by measuring the symmetry of all its rooted quartets and adding up the results. And indeed, in Section 4 below we shall find the trees with maximum and minimum values of our rooted quartet index in both the arbitrary and the bifurcating cases, and it will turn out that the minimum value is reached exactly at the combs (see Fig. 1.(a)), which are usually considered the least balanced trees, and the maximum value is reached, in the arbitrary case, exactly at the rooted stars (see Fig. 1.(b)) and, in the bifurcating case, exactly at the maximally balanced trees (cf. Fig. 3 ), which in both cases are considered the most balanced trees.Besides taking its maximum and minimum values at the expected trees, other important features of our index are that it can be easily computed in linear time and that its mean value and variance can be explicitly computed on any probabilistic model of phylogenetic trees satisfying two natural conditions: independence under relabelings and sampling consistency. This allows us to provide these values for two well-known probabilistic models of bifurcating phylogenetic trees, Ford's α-model [13] and Aldous' β-model [2], which include as specific instances the Yule [14,29] and the uniform [6,24,19] models, as well as for Chen-Ford-Winkel's α-γ-model of multifurcating trees [7]. To our knowledge, this is the first shape index for which closed formulas for the expected value and the variance under the α-γ-model have been provided.The rest of this paper is organized as follows. In the next section we introduce the basic notations and facts on phylogenetic trees that will be used in the rest of the paper, and we recall several preliminary results on probabilistic models of phylogenetic trees, proving those results for which we have not been able to find a suitable reference in the literature. Then, in Section 3, we ...
Because of its catalytic inefficiencies, Rubisco is the most obvious target for improvement to enhance the photosynthetic capacity of plants. Two hypotheses are tested in the present work: (1) existing Rubiscos have optimal kinetic properties to maximize photosynthetic carbon assimilation in existing higher plants; (2) current knowledge allows proposal of changes to kinetic properties to make Rubiscos more suited to changed conditions in chloroplasts that are likely to occur with climate change. The catalytic mechanism of Rubisco results in higher catalytic rates of carboxylation being associated with decreased affinity for CO2, so that selection for different environments involves a trade-off between these two properties. The simulations performed in this study confirm that the optimality of Rubisco kinetics depends on the species and the environmental conditions. In particular, environmental drivers affecting the CO2 availability for carboxylation (Cc) or directly shifting the photosynthetic limitations between Rubisco and RuBP regeneration determine to what extend Rubisco kinetics are optimally suited to maximize CO2 assimilation rate. In general, modeled values for optimal kinetic reflect the predominant environmental conditions currently encountered by the species in the field. Under future climatic conditions, photosynthetic CO2 assimilation will be limited by RuBP-regeneration, especially in the absence of water stress, the largest rise in [CO2] and the lowest increases in temperature. Under these conditions, the model predicts that optimal Rubisco should have high Sc/o and low kcat(c).
One of the main applications of balance indices is in tests of nullmodels of evolutionary processes. The knowledge of an exact formula for a statistic of a balance index, holding for any number n of leaves, is necessary in order to use this statistic in tests of this kind involving trees of any size. In this paper we obtain exact formulas for the variance under the Yule model of the Sackin, the Colless and the total cophenetic indices of binary rooted phylogenetic trees with n leaves.
In this paper, the fuzzy morphological gradients from the fuzzy mathematical morphologies based on t-norms and conjunctive uninorms are deeply analyzed in order to establish which pair of conjunction and fuzzy implications are optimal, in accordance with their performance in edge detection applications. A novel three-step algorithm based on the fuzzy morphology is proposed. The comparison is performed by means of the so-called Pratt's figure of merit. In addition, a statistical analysis is carried out to study the relationship between the different configurations and to establish a classification of the conjunctions and implications considered. Both the objective measure and the statistical analysis conclude that the pairs nilpotent minimum t-norm and the KleeneDienes implication, and the idempotent uninorm obtained with the classical negation as a generator and its residual implication, are the best configurations in this approach, because they also obtain competitive results with respect to other approaches.
Background. The Sackin index S of a rooted phylogenetic tree, defined as the sum of its leaves' depths, is one of the most popular balance indices in phylogenetics, and Sackin's 1972 paper is usually cited as the source for this index. However, what Sackin actually proposed in his paper as a measure of the imbalance of a rooted tree was not the sum of its leaves' depths, but their ``variation''. This proposal was later implemented as the variance of the leaves' depths by Kirkpatrick and Slatkin in 1993, where they also posed the problem of finding a closed formula for its expected value under the Yule model. Nowadays, Sackin's original proposal seems to have passed into oblivion in the phylogenetics literature, replaced by the index bearing his name, which, in fact, was introduced a decade later by Sokal. Results. In this paper we study the properties of the variance of the leaves' depths, V, as a balance index. Firstly, we prove that the rooted trees with $n$ leaves and maximum V value are exactly the combs with n leaves. But although V achieves its minimum value on every space of bifurcating rooted phylogenetic trees with at most 183 leaves at the so-called ``maximally balanced trees'' with n leaves, this property fails for almost every n larger than 184 We provide then an algorithm that finds the bifurcating rooted trees with n leaves and minimum V value in quasilinear time. Secondly, we obtain closed formulas for the expected V value of a bifurcating rooted tree with any number n of leaves under the Yule and the uniform models and, as a by-product of the computations leading to these formulas, we also obtain closed formulas for the variance under the uniform model of the Sackin index and the total cophenetic index of a bifurcating rooted tree, as well as of their covariance, thus filling this gap in the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.