An asynchronous implementation of the ARM microprocessor has been designed and fabricated based on Sutherland's Micropipeline approach. Reviews of this work have shown that considerable performance improvement may be possible in a number of key design areas. This paper assesses the effects of different design styles on the micropipeline latch structures used. The original design has latch structures based on passtransistor transparent latches. An evaluation of the use of single-phase transparent latch structures is given plus the application of 2-phase and 4-phase control techniques.
No abstract
Abstract-An asynchronous implementation of the ARM microprocessor has been developed using an approach based on Sutherland's Micropipelines [1]. The design allows considerable internal asynchronous concurrency. This paper presents the rationale for the work, the organization of the chip, and the characteristics of the prototype silicon. The design displays unusual properties such as nondeterministic (but bounded) prefetch depth beyond a branch instruction, a data dependent throughput, and employs a novel register locking mechanism. This work demonstrates the feasibility of building complex asynchronous systems and gives an indication of the costs and benefits of the Micropipeline approach.
A high performance register bank is a central component of a RISC processor. A novel register bank design has been developed, as an integral part of a self-timed implementation of a commercial RISC microprocessor, to address the problem of register interlocking in an asynchronous micropipelined execution unit.The challenge in an asynchronous design is to maintain coherent register operation while allowing concurrent read ana' write accesses with arbitrary timing. The solution presented here includes a novel arbiter-free locking mechanism which enables efficient read operations in the presence of multiple pending write operations. 1: IntroductionThe growth in demand for hgh performance portable computing equipment has led to a resurgence of interest in asynchronous logic design techques. In order to investigate the power saving potential of asynchronous approaches to CMOS design, a self-timed implementation of the ARM microprocessor is being developed as a commercially realistic technology demonstrator. Earlier work [l] has shown the feasibility of buildmg a complete asynchronous microprocessor; the current project addresses the detailed problems associated with implementing a commercial archtecture with the specific goal of minimising power consumption.The methodology being applied is based or1 Sutherland's "Mcropipelines" [Z], a bundleddata, boundeddelay model. Here, local timing signals are transmitted with a 'bundle' of data bits whose timing is constrained to ensure correct operation. ' h s t e c h q u e -rather than a purely delay-insensitive model [3] -was chosen for its economy in silicon area and its potential for low electrical 35 I 10636404/92 $3.00 @ 1992 IEEE power consumption. The micropipeline approach is somewhat less 'pure' than other approaches to the construction of asynchronous systems because delays in the circuit must be managed quantitatively; however these delays can be modelled and characterised in a similar manner to the critical path analysis used in the design of synchronous circuits.The design of the processor can be decomposed into a few major structural elements, one being the register bank.The ARM register bank contains thu-ty one registers, of whch sixteen are available to the programmer at a given time. All but one of these registers are general purpose and orthogonal; the implementation of the ARM register bank described here is therefore applicable to asynchronous implementations of other RISC processors. 2: Register lockingThe ARM architecture [4] defines a register-based RISC processor in which arithmetic operations require two operands to be read from the register bank and a single result value to be returned. In existing synchronous implementations of the archtecture instruction execution is not pipelined (execution is a single stage of the Fetch -DecodeExecute pipeline) and an arithmetic operation is completed w i h n a siiigle clock cycle. In the asynchronous implementation instruction execution is decomposed into a number of pipeline stages. Th~s concurrent execution impr...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.