Chaos scales graph processing from secondary storage to multiple machines in a cluster. Earlier systems that process graphs from secondary storage are restricted to a single machine, and therefore limited by the bandwidth and capacity of the storage system on a single machine. Chaos is limited only by the aggregate bandwidth and capacity of all storage devices in the entire cluster.Chaos builds on the streaming partitions introduced by X-Stream in order to achieve sequential access to storage, but parallelizes the execution of streaming partitions. Chaos is novel in three ways. First, Chaos partitions for sequential storage access, rather than for locality and load balance, resulting in much lower pre-processing times. Second, Chaos distributes graph data uniformly randomly across the cluster and does not attempt to achieve locality, based on the observation that in a small cluster network bandwidth far outstrips storage bandwidth. Third, Chaos uses work stealing to allow multiple machines to work on a single partition, thereby achieving load balance at runtime.In terms of performance scaling, on 32 machines Chaos takes on average only 1.61 times longer to process a graph 32 times larger than on a single machine. In terms of capacity scaling, Chaos is capable of handling a graph with 1 trillion edges representing 16 TB of input data, a new milestone for graph processing capacity on a small commodity cluster.
Distributed LSM-based databases face throughput and latency issues due to load imbalance across instances and interference from background tasks such as flushing, compaction, and data migration. Hailstorm addresses these problems by deploying the database storage engines over a distributed filesystem that disaggregates storage from processing, enabling storage pooling and compaction offloading. Hailstorm pools storage devices within a rack, allowing each storage engine to fully utilize the aggregate rack storage capacity and bandwidth. Storage pooling successfully handles load imbalance without the need for resharding. Hailstorm offloads compaction tasks to remote nodes, distributing their impact, and improving overall system throughput and response time. We show that Hailstorm achieves load balance in many Mon-goDB deployments with skewed workloads, improving the average throughput by 60%, while decreasing tail latency by as much as 5×. In workloads with range queries, Hailstorm provides up to 22× throughput improvements. Hailstorm also enables cost savings of 47-56% in OLTP workloads. CCS Concepts • Information systems → Distributed storage; Key-value stores; Relational parallel and distributed DBMSs; Physical data models; • Computer systems organization → Secondary storage organization.
This paper aims to initiate a discussion around benchmarking data management systems with machine-learned components. Traditional benchmarks such as TPC or YCSB are insufficient to analyze and understand these learned systems because they evaluate the performance under a stable workload and data distribution. Learned systems automatically specialize and adapt database components to a changing workload, database, and execution environment, thereby making conventional metrics such as average throughput ill-suited to understand their performance fully. Moreover, the standard cost-per-performance metrics fail to account for essential trade-offs related to the training cost of models and the elimination of manual database tuning. We present several ideas for designing new benchmarks that are better suited to evaluate learned systems. The main challenges entail developing new metrics to capture the particularities of learned systems and ensuring that benchmark results remain comparable across many deployments with wide-ranging designs.
Communities based on internal proximity information Reconstructed communities based on external proximity estimates Nr. of trial devices 66 Nr. of trial devices 66 Trial days T k 110 Trial days T k 110 rmin-100 dBm Position estimate interval validity f 5 Normalization factor Į 100.0 Path-loss exponent n 4.8 Meeting threshold ȕ 1.0 rmin-100 dBm Temporal decay Ȝ 0.5 Normalization factor Į 1.0 Aging factor IJ {0.25, 0.5, 0.75} Meeting threshold ȕ 0.0 Temporal decay Ȝ 0.5 Aging factor IJ {0.25, 0.5, 0.75} Table 1. System parameter values used to compute the community accuracy measures.
Current cluster computing frameworks suffer from load imbalance and limited parallelism due to skewed data distributions, processing times, and machine speeds. We observe that the underlying cause for these issues in current systems is that they partition work statically. Hurricane is a high-performance large-scale data analytics system that successfully tames skew in novel ways. Hurricane performs adaptive work partitioning based on load observed by nodes at runtime. Overloaded nodes can spawn clones of their tasks at any point during their execution, with each clone processing a subset of the original data. This allows the system to adapt to load imbalance and dynamically adjust task parallelism to gracefully handle skew. We support this design by spreading data across all nodes and allowing nodes to retrieve data in a decentralized way. The result is that Hurricane automatically balances load across tasks, ensuring fast completion times. We evaluate Hurricane's performance on typical analytics workloads and show that it significantly outperforms stateof-the-art systems for both uniform and skewed datasets, because it ensures good CPU and storage utilization in all cases. CCS CONCEPTS • Information systems; • Applied computing; • Computer systems organization → Architectures; Dependable and fault-tolerant systems and networks; • Networks; * Nicolas Schiper was with EPFL when this work was performed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.