Wook-Shin Han scite author profile

Finding subgraph isomorphisms is an important problem in many applications which deal with data modeled as graphs. While this problem is NP-hard, in recent years, many algorithms have been proposed to solve it in a reasonable time for real datasets using different join orders, pruning rules, and auxiliary neighborhood information. However, since they have not been empirically compared one another in most research work, it is not clear whether the later work outperforms the earlier work. Another problem is that reported comparisons were often done using the original authors' binaries which were written in different programming environments. In this paper, we address these serious problems by re-implementing five state-of-the-art subgraph isomorphism algorithms in a common code base and by comparing them using many real-world datasets and their query loads. Through our in-depth analysis of experimental results, we report surprising empirical findings.

show abstract

Parallelizing query optimization

Han

Kwak

Lee

et al. 2008

Proc. VLDB Endow.

View full text Add to dashboard Cite

Many commercial RDBMSs employ cost-based query optimization exploiting dynamic programming (DP) to efficiently generate the optimal query execution plan. However, optimization time increases rapidly for queries joining more than 10 tables. Randomized or heuristic search algorithms reduce query optimization time for large join queries by considering fewer plans, sacrificing plan optimality. Though commercial systems executing query plans in parallel have existed for over a decade, the optimization of such plans still occurs serially. While modern microprocessors employ multiple cores to accelerate computations, parallelizing query optimization to exploit multi-core parallelism is not as straightforward as it may seem. The DP used in join enumeration belongs to the challenging nonserial polyadic DP class because of its non-uniform data dependencies. In this paper, we propose a comprehensive and practical solution for parallelizing query optimization in the multi-core processor architecture, including a parallel join enumeration algorithm and several alternative ways to allocate work to threads to balance their load. We also introduce a novel data structure called skip vector array to significantly reduce the generation of join partitions that are infeasible. This solution has been prototyped in PostgreSQL. Extensive experiments using various query graph topologies confirm that our algorithms allocate the work evenly, thereby achieving almost linear speed-up. Our parallel join enumeration algorithm enhanced with our skip vector array outperforms the conventional generate-and-filter DP algorithm by up to two orders of magnitude for star queries-linear speedup due to parallelism and an order of magnitude performance improvement due to the skip vector array.

show abstract

Autoregressive Image Generation using Residual Quantization

Lee

Kim

et al. 2022

View full text Add to dashboard Cite

The G* graph database: efficiently managing large distributed dynamic graphs

Labouseur

Birnbaum

Olsen

et al. 2014

Distrib Parallel Databases

View full text Add to dashboard Cite

Efficient Subgraph Matching

Han

Kim

et al. 2019

View full text Add to dashboard Cite

Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

Whang

Lee

Han

2005

View full text Add to dashboard Cite

Hybrid Garbage Collection for Multi-Version Concurrency Control in SAP HANA

Lee¹,

Shin

Park³

et al. 2016

View full text Add to dashboard Cite

Progressive optimization in a shared-nothing parallel database

Han

Markl

et al. 2007

View full text Add to dashboard Cite

Commercial enterprise data warehouses are typically implemented on parallel databases due to the inherent scalability and performance limitation of a serial architecture. Queries used in such large data warehouses can contain complex predicates as well as multiple joins, and the resulting query execution plans generated by the optimizer may be suboptimal due to mis-estimates of row cardinalities. Progressive optimization (POP) is an approach to detect cardinality estimation errors by monitoring actual cardinalities at runtime and to recover by triggering re-optimization with the actual cardinalities measured. However, the original serial POP solution is based on a serial processing architecture, and the core ideas cannot be readily applied to a parallel shared-nothing environment. Extending the serial POP to a parallel environment is a challenging problem since we need to determine when and how we can trigger re-optimization based on cardinalities collected from multiple independent nodes. In this paper, we present a comprehensive and practical solution to this problem, including several novel voting schemes whether to trigger re-optimization, a mechanism to reuse local intermediate results across nodes as a partitioned materialized view, several flavors of parallel checkpoint operators, and parallel checkpoint processing methods using efficient communication protocols. This solution has been prototyped in a leading commercial parallel DBMS. We have performed extensive experiments using the TPC-H benchmark and a real-world database. Experimental results show that our solution has negligible runtime overhead and accelerates the performance of complex OLAP queries by up to a factor of 22.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wook-Shin Han

An in-depth comparison of subgraph isomorphism algorithms in graph databases

Parallelizing query optimization

Autoregressive Image Generation using Residual Quantization

The G* graph database: efficiently managing large distributed dynamic graphs

Efficient Subgraph Matching

Efficient Evaluation of Partial Match Queries for XML Documents Using Information Retrieval Techniques

Hybrid Garbage Collection for Multi-Version Concurrency Control in SAP HANA

Progressive optimization in a shared-nothing parallel database

Contact Info

Product

Resources

About