Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation - PLDI '03 2003
DOI: 10.1145/781163.781165
|View full text |Cite
|
Sign up to set email alerts
|

Region-based hierarchical operation partitioning for multicluster processors

Abstract: Clustered architectures are a solution to the bottleneck of centralized register files in superscalar and VLIW processors. The main challenge associated with clustered architectures is compiler support to effectively partition operations across the available resources on each cluster. In this work, we present a novel technique for clustering operations based on graph partitioning methods. Our approach incorporates new methods of assigning weights to nodes and edges within the dataflow graph to guide the partit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
32
0

Year Published

2005
2005
2012
2012

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(32 citation statements)
references
References 10 publications
(11 reference statements)
0
32
0
Order By: Relevance
“…One important category is the code generation for loops [1,2,10,20,25] by means of modulo scheduling techniques [9,23]. Another category schedules instructions for more general program structures including cyclic and acyclic control flow graphs [6,8,12,13,16,17,21]. In this paper, we focus on the latter category.…”
Section: 3mentioning
confidence: 99%
See 1 more Smart Citation
“…One important category is the code generation for loops [1,2,10,20,25] by means of modulo scheduling techniques [9,23]. Another category schedules instructions for more general program structures including cyclic and acyclic control flow graphs [6,8,12,13,16,17,21]. In this paper, we focus on the latter category.…”
Section: 3mentioning
confidence: 99%
“…VLIW) [6,13,8,12,16,17,21], where the compiler is responsible for both code scheduling and instruction distribution among clusters. However, as we will show later in this paper, the softwareonly approach performs much worse than its hardware-only counterpart when it is applied to out-of-order processors.…”
Section: Introductionmentioning
confidence: 99%
“…Codina et al [24] used a similar strategy as UAS but focused on modulo scheduling. Using graph partitioners for clustering operations has also been investigated in several studies [5,25]. Most of these studies were much different from our work, since our architecture has more features than clustering, and we mainly focus on register allocation targeted toward acyclic code to tie in with the phase ordering in ORC infrastructure.…”
Section: Related Workmentioning
confidence: 99%
“…The only drawback of such an architecture is the intercluster communication cost. Various groups [2,3,4,6,9,10] have studied cluster assignment mechanisms for one thread to reduce the overhead of inter-cluster communication. We extend their ideas to a clustered architecture with multiple threads.…”
Section: Related Workmentioning
confidence: 99%