Proceedings Fifth International Symposium on High-Performance Computer Architecture 1999
DOI: 10.1109/hpca.1999.744349
|View full text |Cite
|
Sign up to set email alerts
|

Distributed modulo scheduling

Abstract: Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units. Organizations composed by clusters of a few functional units and small private register files have been proposed to deal with this problem, an approach highly dependent on scheduli… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0
1

Year Published

2002
2002
2016
2016

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(16 citation statements)
references
References 22 publications
0
15
0
1
Order By: Relevance
“…Several research groups (see, e.g., Nystrom and Eichenberger [1998]; Fernandes et al [1999], and Sanchez and Gonzàlez [2001]) address binding in the context of modulo scheduling algorithms. The objective of modulo scheduling is to software pipeline the inner loop body (i.e., derive a retiming function for its operations), as well as determine adequate binding and scheduling functions, so as to minimize the loop's initiation interval (i.e., maximize throughput).…”
Section: Previous Workmentioning
confidence: 99%
“…Several research groups (see, e.g., Nystrom and Eichenberger [1998]; Fernandes et al [1999], and Sanchez and Gonzàlez [2001]) address binding in the context of modulo scheduling algorithms. The objective of modulo scheduling is to software pipeline the inner loop body (i.e., derive a retiming function for its operations), as well as determine adequate binding and scheduling functions, so as to minimize the loop's initiation interval (i.e., maximize throughput).…”
Section: Previous Workmentioning
confidence: 99%
“…Additionally, loop unrolling is selectively applied for the reason of lowering the pressure on intercluster paths. Fernandes et al [11] describe distributed modulo scheduling, an alternative integrated approach that sequentially uses three strategies for cluster assignment. The first strategy tries to assign a node to a cluster without involving explicit intercluster transfers.…”
Section: Introductionmentioning
confidence: 99%
“…Clustering can also be applied to VLIW architectures [8] [14]. In this case the partitioning is done at compile time.…”
Section: Related Workmentioning
confidence: 99%