In modern digital signal processing and graphics applications, the shifter is an important module, consuming a significant amount of delay. This brief presents an architectural optimization approach to synthesize a faster barrel shifter block, which can be useful to reduce the delay of the design without significantly increasing the area. We have divided the problem of generating the shifter into two steps: i) timing-driven selection of multiple stages for merging, and ii) the design of the merged stage. In our proposed method, we define the notion of dual merged stage, where two stages are merged and the triple merged stage, where three stages are merged into a single composite stage. These merged stages are identified by using a timing-driven algorithm and are used in conjunction with some single stages of the traditional barrel shifter. The use of these merged stages helps reduce the depth of the proposed barrel shifter architecture, thereby improving the delay. The timing-driven nature of our algorithm helps produce a faster implementation for the overall shifter block. We have evaluated the performance of our design by using a number of technology libraries, timing constraints and shifter bit-widths. Our experimental data shows that the shifter block generated by our algorithm is significantly faster (10.19% on average) than the shifter block generated by a commercially available datapath synthesis tool. These improvements were verified on placed-and-routed designs as well.
In state-of-the-art Digital Signal Processing (DSP) and Graphics applications, the shifter is an important module, consuming a significant amount of delay. This paper presents a new architectural optimization approach to synthesize a faster barrel shifter block, which can be very useful to reduce the delay of the design without significantly increasing the area. We have divided the problem of generating the shifter into two steps: timing-driven selection of multiple stages for merging, and the design of the merged stage. Techniques used in these two steps help to produce a faster implementation for the overall shifter block. Our experimental data shows that the shifter block generated by our algorithm is significantly faster (11.39% on average) than the corresponding block generated by a commercially available datapath synthesis tool.
In state-of-the-art digital designs, arithmetic blocks consume a major portion of the total area of the IC. The arithmetic sum-of-product (SOP) is the most widely used arithmetic block. Some of the examples of SOP are adder, subtractor, multiplier, multiply-accumulator (MAC), squarer, chain-of-adders, incrementor, decrementor, etc. In this article, we introduce a novel, area-efficient architecture to share different SOP blocks which are used in a mutually exclusive manner. We implement the core functions of the largest SOP only once and reuse different parts of the core subblocks for all other SOP operations with the help of multiplexers. This architecture can be used in the nontiming-critical paths of the design, to save significant amounts of area. Our experimental data shows that the proposed sharing-based architecture results in about 37% area savings compared to the results obtained from a commercially available best-in-class datapath synthesis tool. In addition, our proposed shared implementation consumes about 18% less power. These improvements were verified on placed-and-routed designs as well.
We present a new detailed routing methodology specifically designed for datapath layouts. In typical state-of-the-art microprocessor designs, datapaths comprise about 70% of the logic (excluding caches). Although research on datapath placement and global routing has been reported, very little research has been reported in the area of detailed routing for datapaths.Datapaths typically comprise regular bit-slices which are replicated. We define a net-cluster, which is collection of similarly structured nets present across different bit-slices. We introduce two clustering schemes (Footprint-driven clustering and Instancedriven clustering) to extract such net-clusters. Then, we perform a strap-based routing on exactly one member net of each net-cluster (in a single representative bit-slice). Next, for each net, we propagate its route to all other nets in its net-cluster. Our algorithm is unique in that it performs the detailed routing on a single bit-slice, and infers the routing for all bit-slices using the notion of net clusters.We demonstrate at least 6¢ speed gains for industrial 32 and 64-bit datapath designs. The regularity of the routes across the bitslices results in more predictable timing characteristics for the resulting datapath layout.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.