Abstract-A VLSI implementation of a programmable pipelined router scheme for parallel machine interconnection networks is presented in this paper. The implementation is based on a dynamic content-addressable memory (DCAM) that supports unique bit masking per entry. The number of required DCAM entries is extremely small; it is of the same order as the node degree (output ports). This, in turn, makes it possible to implement a dynamic content-addressable memory in order to reduce the physical size of the system. A DCAM is implemented with only six and a half transistors (one transistor is shared by two cells). We have provided circuitry and arranged timing to achieve refreshing of the stored data in a hidden fashion. In addition to the DCAM, we have incorporated a fast priority scheme that allows only one entry to be selected. The router executes routing algorithms in 1.5 clock cycles, this being the fastest approach for flexible routers. The prototype router has 24 entries, and is able to sustain a throughput of one routing decision per cycle.Index Terms-Content addressable memory, dynamic circuits, matching with ternary digits, parallel comparison, pipelining, routing algorithm execution, ternary digit logic.
In this paper we introduce a VLSI priority encoder that uses a novel priority lookahead scheme to reduce the delay for the worse case operation of the circuit, while maintaining a very low transistor count. The encoder's topmost input request has the highest priority; this priority descends linearly. Two design approaches for the priority encoder are presented, one without a priority lookahead scheme and one with a priority lookahead scheme. For an N-bit encoder, the circuit with the priority lookahead scheme requires only 1.094 times the number of transistors the circuit without the priority lookahead scheme. Having a 32-bit encoder as an example, the circuit with the priority lookahead scheme is 2.59 times faster than the circuit without the priority lookahead. The worst case operation delay is 4.4 ns for this lookahead encoder, using a 1-m s c alable CMOS technology. The proposed l o okahead scheme can be extended to larger encoders.
In this paper three carbon nanotube FET based static memory cells are compared on read and write delays, energy consumption, and performance under diameter variation corners. The carbon nanotube FET is currently considered to be the possible "beyond CMOS" device due to its1-D transport properties that include low carrier scattering and ballistic transport. The memory cells are classified by their transistor count (6-, 7-and 8-transistor cell.) Under a nominal diameter of 1.51nm, the 8-T cell has the lowest delay and energy consumption of 3.7ps and 0.348fJ, respectively. Simulations with transistor diameter variations show that small n-type device diameters result in significantly slow read and write delays. The 8-transistor cell dissipates the least energy when the transistor diameters range from 1.369nm to 1.659nm.
Interest in subthreshold design has increased due to the emergence of systems that require ultra-low power and the ever increasing leakage currents (now used to drive logic). Subthreshold sacrifices speed for power creating a clear divide between designing for high speed and ultra-low power. It might be beneficial to allow subthreshold circuits to operate in super-threshold, depending on processing needs. In this paper, the feasibility of optimizing device sizes for both subthreshold and above threshold operations is considered. In addition body biasing techniques that could facilitate bridging the speed gap are presented. Device sizing for circuits of the subthreshold region is examined with the view that these circuits could be optimized for subthreshold but also operate effectively in super-threshold. In an effort to attain optimal performance (speed-power), an operating region is identified in terms of the energy-delay product. To enhance the operating speed of both subthreshold and super-threshold circuits, a novel body biasing technique termed tunable body biasing (TBB), is introduced. This approach leads to increased operating frequencies particularly in subthreshold operation and shows no performance degradation at voltages above threshold, hence bridging of the speed gap. Post layout simulations of circuits ranging from simple to more complex ones enable for effective evaluation of optimal device sizing and identifying the optimal power-speed operational region. Simulations have been performed at a modest 180 nm technology node and circuits show optimal operating regions ranging from 0.5 to 1.1 V. Further more results indicate that the TBB approach for an inverter triples speed and has a 60 percent lower EDP while dissipating just 28 percent more energy than a traditionally biased approach (pMOS bulk at VDD and nMOS bulk at Vss).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.