Abstract:Both 'distance' and 'similarity' measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings, and the paper concerns the equivalence or otherwise of these. These measures are usually parameterised by an atomic 'cost' table, defining label-dependent values for swaps, deletions and insertions. We look at the question of whether orderings induced by a 'distance' measure, with some cost-table, can be dualized by a 'similarity' measure, with some other cost-table, and vice-versa. Three kinds of orderings are considered: alignment-orderings, for fixed source S and target T , neighbour-orderings, where for a fixed S, varying candidate neighbours T i are ranked, and pair-orderings, where for varying S i , and varying T j , the pairings S i , T j are ranked. We show that (1) alignment-orderings by distance can be dualized by similarity, and vice-versa; (2) neigbour-ordering and pair-ordering by distance can be dualized by similarity; (3) neighbour-ordering and pair-ordering by similarity can sometimes not be dualized by distance. A consequence of this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via distance
TREE DISTANCE AND SIMILARITYIn many pattern-recognition scenarios the data either takes the form of, or can be encoded as, sequences or trees. Accordingly, there has been much work on the definition, implementation and deployment of measures for the comparison of sequences and for the comparison of trees. These measures are sometimes described as 'distances' and sometimes as 'similarities'. We are concerned in what follows in first distinguishing between these, and then with the question whether orderings induced by a 'distance' measure can be dualized by a 'similarity' measure, and vice-versa. To some extent this can be seen as applying the same kind of analysis to sequence and tree comparison measures as has been applied to set and vector comparison measures (Batagelj and Bren, 1995;Omhover et al., 2005;Lesot and Rifqi, 2010).From statements such as the followingTo compare RNA structures, we need a score system, or alternatively a distance, which measures the similarity (or the difference) between the structures. These two versions of the problem score and distance are equivalent.