Clustered VLIW architectures have been widely adopted in modern embedded multimedia applications for their ability to exploit high degrees of ILP with reasonable trade-off in complexity and silicon costs. Studies have however shown limited performance scaling for wide-issue machines. In this paper we describe the architecture of a clustered VLIW with a runtime reconfigurable inter-cluster bus suitable to address such scalability problem. The architecture is aimed at kernel loops acceleration through a coprocessor approach and allows a customization of the interconnect between neighboring register files before each loop execution. We have adopted an inter-cluster communication mechanism based on a constant-complexity interconnect. The complexity and latency independent of the number of clusters preserve the scalability on issue-width. To handle the limited connectivity, the interconnection resources in the inter-cluster bus are exposed to the compiler, and scheduled like other resources with an adapted version of modulo scheduling. Other relevant features include the capability to define shifting queues in the register files, for a more effective software pipelining support. The addition of a limited amount of reconfigurability to the well established VLIW programming model results in low-overhead inter-cluster communications and a scalable ILP architecture. Simulation results show that we can achieve near linear scalability for certain classes of kernel loops.
Abstract. In this paper we review the architectures designed for wavelet transi:brms, with the purpose to highlight their suitability for inclusion in codec systems. Indeed, common VLSI cost functions (such as AT ~-) are insufficient to evaluate architectures for compression. At the system level, quantization and coding have processing requirements that must be taken into account when designing the transform engine. The hierarchical structure of wavelet transform allows to use "pyramid" algorithms that optimize latency and processor utilization; on-line solutions try to minimize buffering memory. Such approaches can be substituted with more standard ones, if data reordering is mandatory to apply a good quantization strategy. An upcoming commercial solution offers a sound comparison paradigm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.