Automatic parallelization via matrix multiplication

Sato, Shigeyuki; Iwasaki, Hideya

doi:10.1145/1993498.1993554

Cited by 16 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is possible to translate such program by extracting the true dependences using techniques such as Array Dataflow Analysis [16]. Moreover, if the reductions are not provided, we need to use reduction detection techniques [17,18,19]. Also, there are probably links between the notions introduced in both algorithms, such as our accept domain and their relation R want corresponding to the ("desired correspondence between the iterations of both computations").…”

Section: Resultsmentioning

confidence: 99%

On Program Equivalence with Reductions

2014

View full text Add to dashboard Cite

Program equivalence is a well-known problem with a wide range of applications, such as algorithm recognition, program verification and program optimization. This problem is also known to be undecidable if the class of programs is rich enough, in which case semi-algorithms are commonly used. We focus on programs represented as Systems of Affine Recurrence Equations (SARE), defined over parametric polyhedral domains, a well known formalism for the polyhedral model. SAREs include as a proper subset, the class of affine control loop programs. Several program equivalence semi-algorithms were already proposed for this class. Some take into account algebraic properties such as associativity and commutativity. To the best of our knowledge, none of them manage reductions, i.e., accumulations of a parametric number of sub-expressions using an associative and commutative operator. Our main contribution is a new semialgorithm to manage reductions. In particular, we outline the ties between this problem and the perfect matching problem in a parametric bipartite graph.

show abstract

Section: Resultsmentioning

confidence: 99%

On Program Equivalence with Reductions

2014

View full text Add to dashboard Cite

show abstract

“…Benchmarks: Table 3 shows key features of the algorithms used in our experiments. All of these algorithms are collected from previous research efforts in this area [12,20,21]. For other details of these algorithms, readers are referred to [12].…”

Section: Methodsmentioning

confidence: 99%

“…Figure 3 shows the relationship between the number of thread blocks and speedups of GPU implementations over sequential programs for a set of representative algorithms. These algorithms are collected from the previous research work [12,20,21]. Some key features of these algorithms are shown in Table 3.…”

Section: Gpu Specific Needsmentioning

confidence: 99%

See 1 more Smart Citation

Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction

Xia

Jiang

Agrawal

2019

Proceedings of the 28th International Conference on Compiler Construction

View full text Add to dashboard Cite

Much research work has been done to parallelize loops with recurrences over the last several decades. Recently, sampling-andreconstruction method was proposed to parallelize a broad class of loops with recurrences in an automated fashion, with a practical runtime approach. Although the parallelized codes achieve linear scalability across multi-cores architectures, the sequential merge inherent to this method makes it not scalable on many-core architectures, such as GPUs. At the same time, existing parallel merge approaches used for simple reduction loops cannot be directly and correctly applied to this method. Based on this observation, we propose new methods to merge partial results in parallel on GPUs and achieve linear scalability. Our approach involves refined runtime-checking rules to avoid unnecessary runtime check failures and reduce the overhead of reprocessing. We also propose sample converge technique to reduce the number of sample points so that communication and computation overhead is reduced. Finally, based on GPU architectural features, we develop optimization techniques to further improve performance. Our evaluation results of a set of representative algorithms show that our parallel merge implementation is substantially more efficient than sequential merge, and achieves linear scalability on different GPUs.

show abstract

“…The general problem of full automatic parallelization by compilers is extremely complex and remains a grand challenge [55]. Many attempt to solve it in only certain contexts, e.g., for divide and conquer [56], recursive functions [57], distributed architectures [58], graphics processing [59], matrix manipulation [60], asking the developer for assistance [61], and speculative strategies [62]. Our approach focuses on MapReduce-style code over native data containers in a shared memory space using a mainstream programming languages, which may be more amenable to parallelization due to more explicit data dependencies [16].…”

Section: Related Workmentioning

confidence: 99%

Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

Khatchadourian

Tang

Bagherzadeh

et al. 2019

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

View full text Add to dashboard Cite

Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming languages. For example, the Stream API introduced in Java 8 allows for functional-like, MapReduce-style operations in processing both finite and infinite data structures. However, using this API efficiently involves subtle considerations like determining when it is best for stream operations to run in parallel, when running operations in parallel can be less efficient, and when it is safe to run in parallel due to possible lambda expression side-effects. In this paper, we present an automated refactoring approach that assists developers in writing efficient stream code in a semanticspreserving fashion. The approach, based on a novel data ordering and typestate analysis, consists of preconditions for automatically determining when it is safe and possibly advantageous to convert sequential streams to parallel and unorder or de-parallelize already parallel streams. The approach was implemented as a plug-in to the Eclipse IDE, uses the WALA and SAFE analysis frameworks, and was evaluated on 11 Java projects consisting of ∼642K lines of code. We found that 57 of 157 candidate streams (36.31%) were refactorable, and an average speedup of 3.49 on performance tests was observed. The results indicate that the approach is useful in optimizing stream code to their full potential.

show abstract

Automatic parallelization via matrix multiplication

Cited by 16 publications

References 16 publications

On Program Equivalence with Reductions

On Program Equivalence with Reductions

Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction

Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams

Contact Info

Product

Resources

About