Parallel dynamic programming for solving the string editing problem on a CGM/BSP

Alves, Carlos Eduardo Rodrigues; Cáceres, Edson Norberto; Dehne, Frank

doi:10.1145/564870.564916

Cited by 33 publications

(24 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These two versions of the problem score and distance are equivalent. (Herrbach et al, 2006) which are not uncommon in the literature (Alves et al, 2002;Kondrak, 2003;Bose and van der Aalst, 2009), it would be easy to gain the impression that similarity and distance (on sequences and trees) are straightforwardly interchangeable notions. In section 1.1 several distinct kinds of equivalence are defined.…”

Section: Tree Distance and Similaritymentioning

confidence: 98%

On Order Equivalences Between Distance and Similarity Measures on Sequences and Trees

Emms

Franco-Penya²

2012

Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods

View full text Add to dashboard Cite

Abstract:Both 'distance' and 'similarity' measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings, and the paper concerns the equivalence or otherwise of these. These measures are usually parameterised by an atomic 'cost' table, defining label-dependent values for swaps, deletions and insertions. We look at the question of whether orderings induced by a 'distance' measure, with some cost-table, can be dualized by a 'similarity' measure, with some other cost-table, and vice-versa. Three kinds of orderings are considered: alignment-orderings, for fixed source S and target T , neighbour-orderings, where for a fixed S, varying candidate neighbours T i are ranked, and pair-orderings, where for varying S i , and varying T j , the pairings S i , T j are ranked. We show that (1) alignment-orderings by distance can be dualized by similarity, and vice-versa; (2) neigbour-ordering and pair-ordering by distance can be dualized by similarity; (3) neighbour-ordering and pair-ordering by similarity can sometimes not be dualized by distance. A consequence of this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via distance TREE DISTANCE AND SIMILARITYIn many pattern-recognition scenarios the data either takes the form of, or can be encoded as, sequences or trees. Accordingly, there has been much work on the definition, implementation and deployment of measures for the comparison of sequences and for the comparison of trees. These measures are sometimes described as 'distances' and sometimes as 'similarities'. We are concerned in what follows in first distinguishing between these, and then with the question whether orderings induced by a 'distance' measure can be dualized by a 'similarity' measure, and vice-versa. To some extent this can be seen as applying the same kind of analysis to sequence and tree comparison measures as has been applied to set and vector comparison measures (Batagelj and Bren, 1995;Omhover et al., 2005;Lesot and Rifqi, 2010).From statements such as the followingTo compare RNA structures, we need a score system, or alternatively a distance, which measures the similarity (or the difference) between the structures. These two versions of the problem score and distance are equivalent.

show abstract

Section: Tree Distance and Similaritymentioning

confidence: 98%

On Order Equivalences Between Distance and Similarity Measures on Sequences and Trees

Emms

Franco-Penya²

2012

Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods

View full text Add to dashboard Cite

show abstract

“…(q 1 q 2 ), then the second element, P 3 [1].SV [2], which contains 5, points to the first row (5) in P 3 at which the prefix q 1 q 2 changes to another value (q 1 q 3 ). Since the plan partition is sorted in lexicographical order, its SVA can be constructed in linear time, whenever the number of quantifiers in a query graph is constant.…”

Section: Skip Vector Arraymentioning

confidence: 99%

“…P3 [5].SV [1](=8) is assigned to P 3 [4].SV [1], since P 3 [4].QS [1] (=q 1 ) is equal to P 3 [5].QS [1]. P 3 [4].SV [2] is assigned to 5, since P 3 [4].QS [2](=q 2 ) does not overlap P3 [5].QS(=q1q3q4). Similarly, P3 [4].SV [3] is assigned to 5.…”

Section: Skip Vector Arraymentioning

confidence: 99%

“…Sub-problems in other applications of DP depend on only a fixed number of preceding levels (mostly, two), whereas sub-problems in join enumeration depend on all preceding levels. Thus, existing parallel DP algorithms [2,5,11,34,35] cannot be directly applied to our framework. Therefore, we develop a totally new method for parallelizing DP query optimization, which views join enumeration as a series of self-joins on the MEMO table containing plans for subsets of the tables (or quantifiers).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Parallelizing query optimization

et al. 2008

View full text Add to dashboard Cite

Many commercial RDBMSs employ cost-based query optimization exploiting dynamic programming (DP) to efficiently generate the optimal query execution plan. However, optimization time increases rapidly for queries joining more than 10 tables. Randomized or heuristic search algorithms reduce query optimization time for large join queries by considering fewer plans, sacrificing plan optimality. Though commercial systems executing query plans in parallel have existed for over a decade, the optimization of such plans still occurs serially. While modern microprocessors employ multiple cores to accelerate computations, parallelizing query optimization to exploit multi-core parallelism is not as straightforward as it may seem. The DP used in join enumeration belongs to the challenging nonserial polyadic DP class because of its non-uniform data dependencies. In this paper, we propose a comprehensive and practical solution for parallelizing query optimization in the multi-core processor architecture, including a parallel join enumeration algorithm and several alternative ways to allocate work to threads to balance their load. We also introduce a novel data structure called skip vector array to significantly reduce the generation of join partitions that are infeasible. This solution has been prototyped in PostgreSQL. Extensive experiments using various query graph topologies confirm that our algorithms allocate the work evenly, thereby achieving almost linear speed-up. Our parallel join enumeration algorithm enhanced with our skip vector array outperforms the conventional generate-and-filter DP algorithm by up to two orders of magnitude for star queries-linear speedup due to parallelism and an order of magnitude performance improvement due to the skip vector array.

show abstract

“…We have presented some parallel algorithms for finding the similarity between two strings [4,5,6]. Among these we choose the one that is very efficient in practice [6].…”

Section: Parallel String Similaritymentioning

confidence: 99%