Region-based hierarchical operation partitioning for multicluster processors

Chu, Michael; Fan, Kevin; Mahlke, Scott

doi:10.1145/781163.781165

Cited by 19 publications

(32 citation statements)

References 10 publications

(11 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One important category is the code generation for loops [1,2,10,20,25] by means of modulo scheduling techniques [9,23]. Another category schedules instructions for more general program structures including cyclic and acyclic control flow graphs [6,8,12,13,16,17,21]. In this paper, we focus on the latter category.…”

Section: 3mentioning

confidence: 99%

See 1 more Smart Citation

A software-hardware hybrid steering mechanism for clustered microarchitectures

Cai

Codina

González

et al. 2008

2008 IEEE International Symposium on Parallel and Distributed Processing

View full text Add to dashboard Cite

show abstract

Section: 3mentioning

confidence: 99%

“…VLIW) [6,13,8,12,16,17,21], where the compiler is responsible for both code scheduling and instruction distribution among clusters. However, as we will show later in this paper, the softwareonly approach performs much worse than its hardware-only counterpart when it is applied to out-of-order processors.…”

Section: Introductionmentioning

confidence: 99%

A software-hardware hybrid steering mechanism for clustered microarchitectures

Cai

Codina

González

et al. 2008

2008 IEEE International Symposium on Parallel and Distributed Processing

View full text Add to dashboard Cite

show abstract

“…Codina et al [24] used a similar strategy as UAS but focused on modulo scheduling. Using graph partitioners for clustering operations has also been investigated in several studies [5,25]. Most of these studies were much different from our work, since our architecture has more features than clustering, and we mainly focus on register allocation targeted toward acyclic code to tie in with the phase ordering in ORC infrastructure.…”

Section: Related Workmentioning

confidence: 99%

PALF: compiler supports for irregular register files in clustered VLIW DSP processors

Lin

You

Lee

2007

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYWide varieties of register file architectures -developed for embedded processorshave turned to aim at reducing the power dissipation and die size these years, by contrast with the traditional unified register file structures. This article presents a novel register allocation scheme for a clustered VLIW DSP, which is designed with distinctively banked register files in which port access is highly restricted. Whilst the organization of the register files is designed to decrease the power consumption by using fewer port connections, the cluster-based design makes register access across clusters an additional issue, and the switched-access nature of the register file demands further investigations into optimizing register assignment for increasing the instruction-level parallelism. We propose a heuristic algorithm, named ping-pong aware local favorable (PALF) register allocation, to obtain advantageous register allocation that is expected to better utilize irregular register file architectures. The results of experiments performed using a compiler based on the Open Research Compiler (ORC), showed significant performance improvement over the original ORC's approach, which is considered to be an optimized approach for common register file architectures.key words: register allocation; ping-pong register file; DSP; VLIW

show abstract

“…The only drawback of such an architecture is the intercluster communication cost. Various groups [2,3,4,6,9,10] have studied cluster assignment mechanisms for one thread to reduce the overhead of inter-cluster communication. We extend their ideas to a clustered architecture with multiple threads.…”

Section: Related Workmentioning

confidence: 99%