Davide Rizzo scite author profile

Clustered VLIW architectures have been widely adopted in modern embedded multimedia applications for their ability to exploit high degrees of ILP with reasonable trade-off in complexity and silicon costs. Studies have however shown limited performance scaling for wide-issue machines. In this paper we describe the architecture of a clustered VLIW with a runtime reconfigurable inter-cluster bus suitable to address such scalability problem. The architecture is aimed at kernel loops acceleration through a coprocessor approach and allows a customization of the interconnect between neighboring register files before each loop execution. We have adopted an inter-cluster communication mechanism based on a constant-complexity interconnect. The complexity and latency independent of the number of clusters preserve the scalability on issue-width. To handle the limited connectivity, the interconnection resources in the inter-cluster bus are exposed to the compiler, and scheduled like other resources with an adapted version of modulo scheduling. Other relevant features include the capability to define shifting queues in the register files, for a more effective software pipelining support. The addition of a limited amount of reconfigurability to the well established VLIW programming model results in low-overhead inter-cluster communications and a scalable ILP architecture. Simulation results show that we can achieve near linear scalability for certain classes of kernel loops.

show abstract

Wavelet transform architectures: A system level review

Ferretti

Rizzo

1997

View full text Add to dashboard Cite

Abstract. In this paper we review the architectures designed for wavelet transi:brms, with the purpose to highlight their suitability for inclusion in codec systems. Indeed, common VLSI cost functions (such as AT ~-) are insufficient to evaluate architectures for compression. At the system level, quantization and coding have processing requirements that must be taken into account when designing the transform engine. The hierarchical structure of wavelet transform allows to use "pyramid" algorithms that optimize latency and processor utilization; on-line solutions try to minimize buffering memory. Such approaches can be substituted with more standard ones, if data reordering is mandatory to apply a good quantization strategy. An upcoming commercial solution offers a sound comparison paradigm.

show abstract

Towards a VHDL-based synthesis of a wavelet transform processor

Ferretti

Rizzo²

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Davide Rizzo

Handling borders in systolic architectures for the 1-D discrete wavelet transform for perfect reconstruction

Benchmarking Hough Transform Architectures for Real-Time

A scalable wide-issue clustered VLIW with a reconfigurable interconnect

Wavelet transform architectures: A system level review

Towards a VHDL-based synthesis of a wavelet transform processor

Contact Info

Product

Resources

About