We present the design and a first performance evaluation of Thrill -a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cachefriendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping).We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.
The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi's contraction heuristics [SIAM Review, 1991]. We give both sequential and shared-memory parallel implementations of our algorithm. Extensive experiments on both real-world and generated instances show that our algorithm finds the optimal cut on nearly all instances significantly faster than other state-of-the-art algorithms while our error rate is lower than that of other heuristic algorithms. In addition, our parallel algorithm shows good scalability.
The minimum cut problem for an undirected edgeweighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi's contraction heuristics [SIAM Review, 1991]. We give both sequential and shared-memory parallel implementations of our algorithm. Extensive experiments on both real-world and generated instances show that our algorithm finds the optimal cut on nearly all instances significantly faster than other state-of-theart exact algorithms, and our error rate is lower than that of other heuristic algorithms. In addition, our parallel algorithm shows good scalability.
Static mapping is the assignment of parallel processes to the processing elements (PEs) of a parallel system, where the assignment does not change during the application's lifetime. In our scenario we model an application's computations and their dependencies by an application graph. This graph is first partitioned into (nearly) equally sized blocks. These blocks need to communicate at block boundaries. To assign the processes to PEs, our goal is to compute a communication-efficient bijective mapping between the blocks and the PEs. This approach of partitioning followed by bijective mapping has many degrees of freedom. Thus, users and developers of parallel applications need to know more about which choices work for which application graphs and which parallel architectures. To this end, we not only develop new mapping algorithms (derived from known greedy methods). We also perform extensive experiments involving different classes of application graphs (meshes and complex networks), architectures of parallel computers (grids and tori), as well as different partitioners and mapping algorithms. Surprisingly, the quality of the partitions, unless very poor, has little influence on the quality of the mapping. More importantly, one of our new mapping algorithms always yields the best results in terms of the quality measure maximum congestion when the application graphs are complex networks. In case of meshes as application graphs, this mapping algorithm always leads in terms of maximum congestion and maximum dilation, another common quality measure. (a) (b) c e (c) Fig. 1. (a) Application graph Ga with 4-way partition indicated by colors. (b) Communication graph Gc induced by Ga and the partition. Gc expresses the neighborhood relations of Ga's blocks. Edge weights (shown through width) indicate communication volumes between blocks. (c) Processor graph Gp. Nodes and edges represent the PEs and the communication links, respectively. Communication between the green and the red block in Gc, i. e. via ec, requires two hops in Gp.Motivation. Communication costs are crucial for the scalability of many parallel applications. Static mapping, in turn, is crucial when it comes to keeping communication costs under control through (i) providing a partitioning with few edges between blocks and (ii) mapping nearby blocks onto nearby PEs: due to the sparse nature of many large-scale parallel computers, communication costs may vary by several orders of magnitude depending on the distance between the PEs involved [2]. Also, numerous recent applications involve massive complex networks such as social networks or web graphs [3]. These networks usually lead to denser communication graphs and make improved mapping strategies even more desirable. Contribution. We investigate numerous algorithms for static mapping, the scenario being that an application graph is first partitioned into blocks, followed by a bijective mapping of the blocks onto the nodes of a processor graph. The graph partitioners we employ are the state-of-the-art packages M...
The minimum cut problem for an undirected edgeweighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. In this paper, we engineer the fastest known exact algorithm for the problem.State-of-the-art algorithms like the algorithm of Padberg and Rinaldi or the algorithm of Nagamochi, Ono and Ibaraki identify edges that can be contracted to reduce the graph size such that at least one minimum cut is maintained in the contracted graph. Our algorithm achieves improvements in running time over these algorithms by a multitude of techniques. First, we use a recently developed fast and parallel inexact minimum cut algorithm to obtain a better bound for the problem. Then we use reductions that depend on this bound, to reduce the size of the graph much faster than previously possible. We use improved data structures to further improve the running time of our algorithm. Additionally, we parallelize the contraction routines of Nagamochi, Ono and Ibaraki. Overall, we arrive at a system that outperforms the fastest stateof-the-art solvers for the exact minimum cut problem significantly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.