Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation 2011
DOI: 10.1145/1993498.1993554
|View full text |Cite
|
Sign up to set email alerts
|

Automatic parallelization via matrix multiplication

Abstract: Existing work that deals with parallelization of complicated reductions and scans focuses only on formalism and hardly dealt with implementation. To bridge the gap between formalism and implementation, we have integrated parallelization via matrix multiplication into compiler construction. Our framework can deal with complicated loops that existing techniques in compilers cannot parallelize. Moreover, we have sophisticated our framework by developing two sets of techniques. One enhances its capability for para… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…It is possible to translate such program by extracting the true dependences using techniques such as Array Dataflow Analysis [16]. Moreover, if the reductions are not provided, we need to use reduction detection techniques [17,18,19]. Also, there are probably links between the notions introduced in both algorithms, such as our accept domain and their relation R want corresponding to the ("desired correspondence between the iterations of both computations").…”
Section: Resultsmentioning
confidence: 99%
“…It is possible to translate such program by extracting the true dependences using techniques such as Array Dataflow Analysis [16]. Moreover, if the reductions are not provided, we need to use reduction detection techniques [17,18,19]. Also, there are probably links between the notions introduced in both algorithms, such as our accept domain and their relation R want corresponding to the ("desired correspondence between the iterations of both computations").…”
Section: Resultsmentioning
confidence: 99%
“…Benchmarks: Table 3 shows key features of the algorithms used in our experiments. All of these algorithms are collected from previous research efforts in this area [12,20,21]. For other details of these algorithms, readers are referred to [12].…”
Section: Methodsmentioning
confidence: 99%
“…Figure 3 shows the relationship between the number of thread blocks and speedups of GPU implementations over sequential programs for a set of representative algorithms. These algorithms are collected from the previous research work [12,20,21]. Some key features of these algorithms are shown in Table 3.…”
Section: Gpu Specific Needsmentioning
confidence: 99%
See 1 more Smart Citation
“…The general problem of full automatic parallelization by compilers is extremely complex and remains a grand challenge [55]. Many attempt to solve it in only certain contexts, e.g., for divide and conquer [56], recursive functions [57], distributed architectures [58], graphics processing [59], matrix manipulation [60], asking the developer for assistance [61], and speculative strategies [62]. Our approach focuses on MapReduce-style code over native data containers in a shared memory space using a mainstream programming languages, which may be more amenable to parallelization due to more explicit data dependencies [16].…”
Section: Related Workmentioning
confidence: 99%