In the DSP world, many media workloads have to perform a specific amount of work in a specific period of time. This observation led us to examine Simultaneous Multithreading (SMT) and Chip Multiprocessing (CMP) for a VLIW DSP architecture (specifically the Star*Core SC140), in conjunction with Frequency/Voltage scaling to decrease dynamic power consumption in next-generation wireless handsets. We study the resulting performance and power characteristics of the two approaches using simulation, compiled code, and realistic workloads that respect real-time constraints. We find that a multithreaded DSP can utilize the available functional units much more efficiently, performing as well as a non-multithreaded DSP but with substantial power savings. Power consumption can also be lowered by using a chip-multiprocessor (CMP) operating at low frequency. We compare the power consumption of an SMT DSP with a CMP DSP under different architectural assumptions; we find that the SMT DSP uses up to 40% less power than the CMP DSP in our target environment. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The CRISP MicroprocessorThe AT&T CRISP Microprocessor is a high performance general purpose 32-bit processor. It has been implemented as a single CMOS chip containing 172,163 transistors in a 1.75~t CMOS technology and runs at a clock frequency of 16 MHz. 1 The CRISP Microprocessor achieves performance through traditional techniques, such as pipelining, and from several new techniques not before found in microprocessor designs. This paper focuses on a detailed description of hardware architecture, including the pipeline structure and details of the architectural innovations. A brief introduction to the instruction-set and major features are given for background.The CRISP instruction-set is carefully streamlined to allow an efficient pipelined implementation. CRISP consists of two logically separate machines, a Prefetch and Decode Unit and an Execution Unit. These units are connected by a decodedinstruction cache. With this decoupled parallel operation and internal pipelining, CRISP is capable of issuing a new instruction every cycle. Fast operand access is accomplished with Stack Cache registers 2 instead of general purpose data registers.Efficient procedure calls are possible because of the Stack Cache and a minimal subroutine linkage mechanism. Branches can be executed in zero time by Branch Folding. 3 A highly decoded instruction cache allows memory-to-memory style instructions to be often executed in a single cycle by a RISC style Execution Unit. Code generation by compilers is simplified as there are only a few instructions and addressing modes to chose from, and register allocation is not required. A variablelength instruction-encoding yields good code density (equal to the VAX) and reduces off-chip instruction traffic. These instructions are translated by the Prefetch Decode Unit to a fixed-length internal format for high speed execution.Since 1975 the Bell Labs C Machine Project has designed several computer architectures to support efficiently the C Programming Language. 2,4,5,6 These designs evolved into the current C Machine instruction-set architecture. The CRISP Microprocessor represents a particular implementation of the C Machine architecture. The current architecture stabilized in 1981 for an ECL implementation that was never completed. The design team for CRISP was formed in April 1983 and the first mask was submitted for fabrication in February 1986.The goal of the C Machine Project was to design and build a computer with significantly better cost/performance characteristics than commercially available computers. We were seeking architectural changes that could provide an order of magnitude greater performance than the machines commonly available to us. The C Machine was designed with an iterative methodology based on extensive measurements of C programs. Part of the method consisted of a cycle of proposing a machine, writing a compiler, running a large body of UNIX software through the compiler and analysis tools, and then using measurements to add or delete features and propose a new machine. ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.