“…However, for a certain subclass of PBQP an efficient solver [3,70] exists that computes the optimal solution in linear time and applies heuristics in order to compute a solution for general PBQP problems in cubic runtime. This solver has been applied to different tasks of a compiler backend like code selection [28], register allocation [68] as well as address mode selection [69] yielding good results. The PBQP is formally defined over an n-tuple of boolean decision vectors X = x 1 , .…”
Compiler-in-the-Loop (CiL) architecture exploration is widely accepted as being the right track for fast development of Application Specific Instruction-set Processors (ASIP). In this context, both, automatic application-specific Instruction Set Extension (ISE) and code generation by a compiler have received huge attention in the past. Together, both techniques enable processor designers to quickly adapt a processor's Instruction Set Architecture (ISA) to the needs of a certain set of applications and to provide an appropriate high-level programming model. This manuscript presents a tool flow for identification and utilization of Custom Instructions (CIs) during architecture exploration in an automated fashion. By embedding this tool flow in an industry-proven architecture exploration framework, a methodology for simultaneous compiler/architecture co-exploration is derived. The advantage of the presented tool flow lies in its ability to develop a reusable ISA and an appropriate compiler for a set of applications and therefore to support the design of programmable architectures. In addition, ASIP architecture exploration is effectively improved since time consuming application analysis and compiler retargeting is automated. Through compilation and simulation of several benchmarks in accordance to extended ISAs, reliable feedback on speedup, code size and usability of identified CIs is provided. Furthermore, results on area consumption for extended ISAs are presented in order to compare the obtained speedup with the invested hardware effort of new CIs.Extension of Conference Paper: An earlier version [66] of this paper appeared in the proceedings of the 5th IEEE/ACM international conference on hardware/software codesign and system synthesis. It introduces a code-generator named CBurg which is now applied for implementing a code-selector engine of the CoSy compiler system from ACE. Additionally, a methodology for recurrence-aware identification of custom instructions is presented that builds on the data flow graphs from the compiler's intermediate representation. At the same time, it produces a code-generator description which is used to retarget the compiler backend to a new instruction set.
“…However, for a certain subclass of PBQP an efficient solver [3,70] exists that computes the optimal solution in linear time and applies heuristics in order to compute a solution for general PBQP problems in cubic runtime. This solver has been applied to different tasks of a compiler backend like code selection [28], register allocation [68] as well as address mode selection [69] yielding good results. The PBQP is formally defined over an n-tuple of boolean decision vectors X = x 1 , .…”
Compiler-in-the-Loop (CiL) architecture exploration is widely accepted as being the right track for fast development of Application Specific Instruction-set Processors (ASIP). In this context, both, automatic application-specific Instruction Set Extension (ISE) and code generation by a compiler have received huge attention in the past. Together, both techniques enable processor designers to quickly adapt a processor's Instruction Set Architecture (ISA) to the needs of a certain set of applications and to provide an appropriate high-level programming model. This manuscript presents a tool flow for identification and utilization of Custom Instructions (CIs) during architecture exploration in an automated fashion. By embedding this tool flow in an industry-proven architecture exploration framework, a methodology for simultaneous compiler/architecture co-exploration is derived. The advantage of the presented tool flow lies in its ability to develop a reusable ISA and an appropriate compiler for a set of applications and therefore to support the design of programmable architectures. In addition, ASIP architecture exploration is effectively improved since time consuming application analysis and compiler retargeting is automated. Through compilation and simulation of several benchmarks in accordance to extended ISAs, reliable feedback on speedup, code size and usability of identified CIs is provided. Furthermore, results on area consumption for extended ISAs are presented in order to compare the obtained speedup with the invested hardware effort of new CIs.Extension of Conference Paper: An earlier version [66] of this paper appeared in the proceedings of the 5th IEEE/ACM international conference on hardware/software codesign and system synthesis. It introduces a code-generator named CBurg which is now applied for implementing a code-selector engine of the CoSy compiler system from ACE. Additionally, a methodology for recurrence-aware identification of custom instructions is presented that builds on the data flow graphs from the compiler's intermediate representation. At the same time, it produces a code-generator description which is used to retarget the compiler backend to a new instruction set.
“…A number of papers have investigated the use of multi-bank memory to achieve maximum instruction level parallelism [1,[5][6][7][8][9][10][11][12][13][14][15][16]. Among these previous studies, only two methods in [6][7][8][9] contain all five phases.…”
Section: Related Workmentioning
confidence: 99%
“…Methods in [1,5,10,11] contain all phases except for register/accumulator assignment, and others in [12,13] are simply variable partitioning mechanisms. For heterogeneous register sets, [14][15][16] present specific register allocation algorithms to fit their irregularity. In addition, because nested loops are the time-critical sections in DSP applications, their execution time will dominate the entire computational performance.…”
Section: Related Workmentioning
confidence: 99%
“…Although parallel access, which is enabled by multi-bank memory, is useful to explore the potential of higher memory bandwidth, it gives rise to the problem of how to partition the variables into the multiple memory banks [1,[5][6][7][8][9][10][11][12][13][14][15][16]. Similarly, using heterogeneous register sets can decrease the architectural complexity but increases the difficulty of deciding which register set to use for a certain instruction [6,7].…”
To meet strict speed and power requirements for embedded applications, many high-end digital Signal Processors (DSPs) commonly employ non-orthogonal architectures that are typically characterized by irregular data paths, heterogeneous registers, and multiple memory banks. Obviously to harvest the benefits provided by this non-orthogonal architecture sufficient compiler support is necessary and important. However, the complexity of such architectures presents a great challenge to compiler design and the usual compilation techniques for general-purpose CPUs do not adapt well to the irregularity of DSP. The entire code generation process must include the following phases: intermediate representation, code compaction, instruction scheduling, memory bank assignment (or variable partition), and register/accumulator assignment. Much related research only considers some phases, which is inadequate. In this paper, we present an effective code generation algorithm named Rotation Scheduling with Spill Codes Predicting (RSSP) to maximally exploit the benefits of non-orthogonal architectures. It contains six parts that cover almost the entire phases of the code generation process. As well as introducing the detailed principles and algorithms of the proposed RSSP, we use an analytic model to evaluate its preliminary performance. Evaluation results clearly demonstrate the effectiveness of the proposed method. Furthermore, we also present some preliminary ideas to generalize RSSP, which can make it more practicable and suit various DSPs with similar architectural features.
“…(Leupers and Kotte, 2001;Saghir et al, 1994) focus on designing variable partitioning mechanisms, which try to evenly distribute memory accesses and explore the potential of higher memory bandwidth. For heterogeneous register sets, (Daveau et al, 2004;Scholz and Eckstein, 2002;Zhuang et al, 2004) present specific register allocation algorithms to fit their irregularity. Methods proposed in Lee and Chen (2004), Saghir et al (1996), Wang and Hu (2004), Zhuge et al (2001) solve both instruction scheduling and memory bank assignment problems, but do not consider the limitation of registers/accumulators.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.