PurposeThe purpose of this paper is to propose Extensible Markup Language (XML) data partitioning schemes that can cope with static and dynamic allocation for parallel holistic twig joins: grid metadata model for XML (GMX) and streams‐based partitioning method for XML (SPX).Design/methodology/approachGMX exploits the relationships between XML documents and query patterns to perform workload‐aware partitioning of XML data. Specifically, the paper constructs a two‐dimensional model with a document dimension and a query dimension in which each object in a dimension is composed from XML metadata related to the dimension. GMX provides a set of XML data partitioning methods that include document clustering, query clustering, document‐based refinement, query‐based refinement, and query‐path refinement, thereby enabling XML data partitioning based on the static information of XML metadata. In contrast, SPX explores the structural relationships of query elements and a range‐containment property of XML streams to generate partitions and allocate them to cluster nodes on‐the‐fly.FindingsGMX provides several salient features: a set of partition granularities that balance workloads of query processing costs among cluster nodes statically; inter‐query parallelism as well as intra‐query parallelism at multiple extents; and better parallel query performance when all estimated queries are executed simultaneously to meet their probability of query occurrences in the system. SPX also offers the following features: minimal computation time to generate partitions; balancing skewed workloads dynamically on the system; producing higher intra‐query parallelism; and gaining better parallel query performance.Research limitations/implicationsThe current status of the proposed XML data partitioning schemes does not take into account XML data updates, e.g. new XML documents and query pattern changes submitted by users on the system.Practical implicationsNote that effectiveness of the XML data partitioning schemes mainly relies on the accuracy of the cost model to estimate query processing costs. The cost model must be adjusted to reflect characteristics of a system platform used in the implementation.Originality/valueThis paper proposes novel schemes of conducting XML data partitioning to achieve both static and dynamic workload balance.
Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.
The advancement of multi-core processors technology has led to changing course of computing and enabled us to maximize the computing performance. In this study, we present a parallel TwigStack algorithm executed on a shared-memory multi-core system for achieving scalable query performance against large XML data. Our proposed scheme explores the following features. Firstly, we perform on-the-fly partitioning on input streams of XML nodes for subsequent parallel execution and, thereby, ensure that query solutions in a partition can be obtained by the TwigStack algorithm without being dependent on other partitions. Secondly, we propose a scheme for estimating the optimal partition size for a given system configuration by taking L2-cache size into account. Finally, we introduce a partition prefetching technique to alleviate the overheads of performing the on-the-fly partitions. The experimental results demonstrate that our proposed parallel algorithm works effectively and efficiently. The parallel speedup scales up to the number of available CPU-cores.
Prosiding Use Cases Artificial Intelligence Indonesia adalah buku yang mengumpulkan hasil-hasil kajian dan liputan 26 use cases inovasi dan 4 inisiatif pemanfaatan kecerdasan artifisial yang kemudian dipetakan menjadi lima klaster bidang kecerdasan artifisial, yakni: riset industri dan hankam, layanan publik dan kesehatan, kota cerdas dan kebencanaan, ketahanan pangan dan maritim, serta klaster inisiatif pemanfaatan kecerdasan artifisial. Materi buku diperoleh dari para kontributor seluruh anggota quadhelix dan para narasumber pegiat kecerdasan artifisial di Indonesia. Buku ini akan membantu masyarakat dalam mendapatkan pengetahuan dan pencerahan tentang seluruh teknologi kecerdasan artifisial yang membantu sektor-sektor terkait dalam hal otomatisasi, alat bantu untuk menganalisis, membuat rekomendasi serta keputusan, memprediksi dan sebagainya.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.