Optimistic parallelism requires abstractions

Kulkarni, Milind; Pingali, Keshav; Walter, Bruce; Ramanarayanan, Ganesh; Bala, Kavita; Chew, L. Paul

doi:10.1145/1273442.1250759

Cited by 139 publications

(146 citation statements)

References 45 publications

Supporting

Mentioning

143

Contrasting

Order By: Relevance

“…A few larger workloads that target TM systems have been designed to address the disparity between microbenchmarks and full applications: Delaunay mesh generation [35], database management [14], BerkeleyDB [12], maze routing [40], and Delaunay mesh refinement and agglomerative clustering [24]. These applications represent realistic workloads and avoid the pitfalls of microbenchmarks.…”

Section: B Transactional Memory Benchmarksmentioning

confidence: 99%

See 1 more Smart Citation

STAMP: Stanford Transactional Applications for Multi-Processing

Minh

Chung

Kozyrakis

et al. 2008

2008 IEEE International Symposium on Workload Characterization

394

117

View full text Add to dashboard Cite

Transactional Memory (TM) is emerging as a promising technology to simplify parallel programming. While several TM systems have been proposed in the research literature, we are still missing the tools and workloads necessary to analyze and compare the proposals. Most TM systems have been evaluated using microbenchmarks, which may not be representative of any real-world behavior, or individual applications, which do not stress a wide range of execution scenarios.We introduce the Stanford Transactional Application for Multi-Processing (STAMP), a comprehensive benchmark suite for evaluating TM systems. STAMP includes eight applications and thirty variants of input parameters and data sets in order to represent several application domains and cover a wide range of transactional execution cases (frequent or rare use of transactions, large or small transactions, high or low contention, etc.). Moreover, STAMP is portable across many types of TM systems, including hardware, software, and hybrid systems. In this paper, we provide descriptions and a detailed characterization of the applications in STAMP. We also use the suite to evaluate six different TM systems, identify their shortcomings, and motivate further research on their performance characteristics.

show abstract

Section: B Transactional Memory Benchmarksmentioning

confidence: 99%

“…The usage of transactions in yada is similar to that in [24], but it is applied to a different algorithm in this benchmark. Accesses to the work queue are enclosed by a transaction as is the entire refinement of a skinny triangle.…”

Section: B Applicationsmentioning

confidence: 99%

STAMP: Stanford Transactional Applications for Multi-Processing

Minh

Chung

Kozyrakis

et al. 2008

2008 IEEE International Symposium on Workload Characterization

394

117

View full text Add to dashboard Cite

show abstract

“…The experiments were performed in a system with 2 Intel Xeon E5-2660 Sandy Bridge-EP CPUs (8 cores/CPU) at 2.2 GHz and 64 GB of RAM, using g++ 4.7.2 with optimization flag −O3. The graph, node and edge classes used were taken from the Galois system [3], as they were found to be more efficient than the locally developed ones used in [5] and the skeleton transparently supports any classes. The inputs were a road map of the USA with 24 million nodes and 58 million edges for Boruvka, IS and ST, a road map of New York City with 264 thousand nodes and 733 thousand edges for SSSP -both maps taken from [22]-and a mesh with 1 million triangles taken from the Galois project for DMR.…”

Section: Discussionmentioning

confidence: 99%

“…RELATED WORK While we have not found any other skeleton-based approach oriented to the parallelization of this kind of applications, there are proposals with this aim. The Galois system [3] is a framework for this kind of algorithms that relies on user annotations that describe the properties of the operations. Its interface can be simplified though, if only cautious and unordered algorithms are considered.…”

Section: Discussionmentioning

confidence: 99%

“…For example, [3] proposes a framework that relies on user annotations that describe the properties of the operations, while [4] is based on a language to describe the evolution of the working set associated to each parallel task. This paper focuses on [5], a parallel algorithmic skeleton [6] [7] that allows to parallelize a large class of irregular applications with little programmer effort by applying a data parallel approach extended with new abstractions [8].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing and Evaluating the Configuration Capability of a Skeleton for Irregular Computations

González

Fraguela

2015

2015 23rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing

View full text Add to dashboard Cite

Although skeletons largely facilitate the parallelization of algorithms, they often provide little support for the work decomposition. Also, while they have been widely applied to regular computations, this has not been case for irregular algorithms that can exploit amorphous data-parallelism, whose parallelization in fact requires much more effort from programmers and thus benefits more from a structured approach. In this paper we improve and evaluate the configurability of a recently proposed skeleton that allows to parallelize this latter kind of algorithms. Namely, the skeleton allows to easily change critical details such as the data structures, the work partitioning algorithm or the task granularity to use. The simple procedures to choose among these possibilities and their influence on performance are described and evaluated. We conclude that the skeleton allows to conveniently explore different possibilities for the parallelization of irregular applications, which can result in substantial performance improvements.

show abstract