Abstract.The ease with which genomic data can now be generated using Next Generation Sequencing technologies combined with a wealth of legacy data holds great promise for exciting new insights into the evolutionary relationships between and within the kingdoms of life. At the subspecies level (e.g., varieties or strains) dendograms, that is, certain edge-weighted rooted trees whose leaves are the elements of a set X of organisms under consideration, are often used to represent those relationships. As is well known, dendrograms can be uniquely reconstructed from distances provided all distances on X are known. More often than not, real biological datasets do not satisfy this assumption, implying that the sought dendrogram need not be uniquely determined by the available distances with regard to topology, edge-weighting, or both. To better understand the structural properties a set L ⊆ X 2 has to satisfy to overcome this problem, various types of lassos have been introduced. Here, we focus on the question of when a lasso uniquely determines the topology of a dendrogram; that is, it is a topological lasso for its underlying tree. We show that any set-inclusion minimal topological lasso for such a tree T can be transformed into a structurally nice minimal topological lasso for T . Calling such a lasso a distinguished minimal topological lasso for T , we characterize it in terms of the novel concept of a cluster marker map for T . In addition, we present novel results concerning the heritability of such lassos in the context of the subtree and supertree problems.Key words. dendrogram, block graph, claw-free, topological lasso, X-tree AMS subject classifications. 05C05, 92D15
DOI. 10.1137/1309276441. Introduction. In many topical studies in computational biology ranging from gene onthology [9] via genome-wide association studies in population genetics [22] to evolutionary genomics [21], the following fundamental mathematical problem is encountered: Given a distance D on some set X of objects, find a dendrogram D on X (essentially a rooted tree T = (V, E) with no degree-two vertices but possibly the root whose leaf set is X together with an edge-weighting ω : E → R ≥0 ; see Figure 2 for examples) such that the distance induced by D on any two of its leaves x and y equals D (x, y). In the ideal case that the distances between any two elements of X are available, it is well understood when such a tree is uniquely determined by them, and fast algorithms for reconstructing it from them are known (see, e.g.,[10, Chapter 9.2] and [28, Chapter 7.2], where dendrograms are considered in the slightly more general forms of dated rooted X-trees and equidistant representations of dissimilarities, respectively, and [2, Chapter 3] as well as the references in all three of these sources for more on this).The reality, however, tends to be different in many cases in that distances between pairs of objects might be missing or are not sufficiently reliable to warrant inclusion of that distance in an analysis; see, e.g., [25,26,29] for mo...