This paper provides an overview of the Parallel Architecture Core (PAC) project led by SoC Technology Center of Industrial Technology Research Institute (STC/ITRI) in Taiwan. The background of PAC project, a brief introduction to PAC core technologies, PAC SoC development suite, PAC benchmarks, and applications are presented. The main objective of the PAC development plan is to enhance industrial development competitiveness in the core technology related to key components, especially for portable multimedia applications.
Abstract.Compiler is substantially regarded as the most essential component in the software toolchain to promote a successful processor design. This paper describes our preliminary employment of the Open Research Compiler (ORC) infrastructure on a novel VLIW DSP processor (known as PAC DSP core) and its specific compilation and optimization design. The PAC DSP processor exceedingly utilized port-restricted, distinct partitioned register file structures in addition to the heterogeneous clustered datapath architecture to attain low power consumption and reduced die size; however, these architectural features lend new challenges to the compiler construction. As part of an effort to deal with the challenges of efficient code generation for PAC DSP, the register allocation scheme developed in this work and other retargeting optimization phases are also presented. Results indicated that our compiler development for PAC DSP could gives an early estimation of architecture performance so that refinements of architectures are possible with the software feedbacks. Our experiences in designing the compiler support for heterogeneous VLIW DSP processors with irregular resource constraints may benefit those who have interests in the compiler construction for the similar architectures.
SUMMARYEmbedded processors developed within the past few years have employed novel hardware designs to reduce the ever-growing complexity, power dissipation, and die area. Although using a distributed register file architecture is considered to have less read/write ports than using traditional unified register file structures, it presents challenges in compilation techniques to generate efficient codes for such architectures. This paper presents a novel scheme for register allocation that includes global and local components on a VLIW DSP processor with distributed register files whose port access is highly restricted. In the scheme, an optimization phase performed prior to conventional global/local register allocation, named global/local register file assignment (RFA), is used to minimize various register file communication costs. A heuristic algorithm is proposed for global RFA to make suitable decisions based on local RFA. Experiments were performed by incorporating our schemes on a novel VLIW DSP processor with non-uniform register files. The results indicate that the compilation based on our proposed approach delivers significant performance improvements, compared with the solution without using our proposed global register allocation scheme.
Abstract. High-performance and low-power VLIW DSP processors are increasingly deployed on embedded devices to process video and multimedia applications. For reducing power and cost in designs of VLIW DSP processors, distributed register files and multi-bank register architectures are being adopted to eliminate the amount of read/write ports in register files. This presents new challenges for devising compiler optimization schemes for such architectures. In our research work, we address the compiler optimization issues for PAC architecture, which is a 5-way issue DSP processor with distributed register files. We show how to support an important class of compiler optimization problems, known as copy propagations, for such architecture. We illustrate that a naive deployment of copy propagations in embedded VLIW DSP processors with distributed register files might result in performance anomaly. In our proposed scheme, we derive a communication cost model by cluster distance, register port pressures, and the movement type of register sets. This cost model is used to guide the data flow analysis for supporting copy propagations over PAC architecture. Experimental results show that our schemes are effective to prevent performance anomaly with copy propagations over embedded VLIW DSP processors with distributed files.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.