Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems 2006
DOI: 10.1145/1168857.1168876
|View full text |Cite
|
Sign up to set email alerts
|

Instruction scheduling for a tiled dataflow architecture

Abstract: This paper explores hierarchical instruction scheduling for a tiled processor. Our results show that at the top level of the hierarchy, a simple profile-driven algorithm effectively minimizes operand latency. After this schedule has been partitioned into large sections, the bottom-level algorithm must more carefully analyze program structure when producing the final schedule.Our analysis reveals that at this bottom level, good scheduling depends upon carefully balancing instruction contention for processing el… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0
2

Year Published

2008
2008
2020
2020

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 31 publications
(14 citation statements)
references
References 49 publications
0
12
0
2
Order By: Relevance
“…Loading instructions onto the grid is done through co-operation of the microarchitecture and the runtime system. When an instruction produces a result for a consumer instruction not already on the grid, the runtime system is signalled [6]. The placement of the incoming instruction is decided using either a statically constructed table, or an online algorithm which creates a new mapping.…”
Section: Wavescalar Architecturementioning
confidence: 99%
See 1 more Smart Citation
“…Loading instructions onto the grid is done through co-operation of the microarchitecture and the runtime system. When an instruction produces a result for a consumer instruction not already on the grid, the runtime system is signalled [6]. The placement of the incoming instruction is decided using either a statically constructed table, or an online algorithm which creates a new mapping.…”
Section: Wavescalar Architecturementioning
confidence: 99%
“…Inter-cluster communication is through a dynamically-routed packet network. Recently, a hierarchical instruction scheduling algorithm [6] has been proposed for the WaveScalar architecture, which partitions the application's dataflow graph into smaller groups, and assign these groups to PEs. We augmented hierarchical instruction scheduling algorithm to take into consideration the control flow information (i.e.…”
Section: Introductionmentioning
confidence: 99%
“…Instead, the focus of architecture-specific schedulers has typically been on developing polynomial-time algorithms that approximate the optimal solution using knowledge about the architecture. Chronologically, this body of work includes the BUG scheduler for VLIW proposed in 1985[Ellis 1985, UAS scheduler for clustered VLIW [Özer et al 1998], synchronous dataflow graph scheduling [Battacharyya et al 1996], RAW scheduler [Lee et al 1998], CARS VLIW code generation and scheduler [Kailas et al 2001], TRIPS scheduler [Coons et al 2006;Nagarajan et al 2004], Wavescalar scheduler [Mercaldi et al 2006a], and CCA scheduler proposed in 2008 . While heuristic-based approaches are popular and effective, they have three problems: (1) poor compiler developer/architect productivity since new algorithms, heuristics, and implementations are required for each architecture; (2) lack of insight on optimality of solution; and (3) sandboxing of heuristics to specific architectures-understanding and using techniques developed for one spatial architecture in another is very hard.…”
Section: Introductionmentioning
confidence: 99%
“…Instead, the focus of architecture-specific schedulers has typically been on developing polynomial-time algorithms that approximate the optimal solution using knowledge about the architecture. Chronologically, this body of work includes the BUG scheduler for VLIW proposed in 1985 [17], UAS scheduler for clustered VLIW [41], synchronous data-flow graph scheduling [7], RAW scheduler [36], CARS VLIW code-generation and scheduler [33], TRIPS scheduler [12,40], Wavescalar scheduler [37], and CCA scheduler proposed in 2008 [43]. While heuristic-based approaches are popular and effective, they have three problems: i) poor compiler developer/architect productivity since new algorithms, heuristics, and implementations are required for each architecture, ii) lack of insight on optimality of solution, and iii) sandboxing of heuristics to specific architectures -understanding and using techniques developed for one spatial architecture in another is very hard.…”
Section: Introductionmentioning
confidence: 99%