A highly parallel (more than a thousand) datapoW machine EM-4 is now under development. The EM-4 &sign principle is to construct a high performance computer using a compact architecture by overcoming several defects of dataflow machines. Constructing the EM-4, it is essential to fabricate a processing element (PE) on a single chip for reducing operation speed, system size, design complexhy and cost. In the EM-4. the PE . called EMC-R, has been specially designed using a 50,OOOgate gate array chip. This paper focuses on an architecture of the EMC-R. The distinctive features of it are: a strongly connected arc datafiow model; a direct matching scheme; a RISC-based design; a deadlock-free on-chip packet switch; and an integration of a packet-based circular pipeline and a register-based advanced control pipeline. These features are intensively examined, and the instruction set architecture and the conftguration architecture which exploit them are &scribed.
A register cache has been proposed to solve the problems of the huge register files of recent superscalar processors. The register cache reduces the effective access latency of the register file for IPC improvement, simplifies the bypass network, and reduces the ports of the main register file. Though the primary purpose of the previous works is to improve IPC, the misses on the register cache may degrade the IPC. We propose Non-Latency-Oriented Register Cache System (NORCS). Though the effects of NORCS are the same as the conventional systems, it is free from register cache miss penalties that the conventional systems suffer from. In NORCS, the register cache itself is not different from that of the conventional systems. The difference is that the instruction pipeline has stages to read the main register file, which all instructions go through regardless of register cache hit / miss. Therefore, the instruction pipeline of NORCS is not immediately disturbed by the register cache misses. For a realistic 4-way superscalar processor, NORCS can simplify the bypass network to the same complexity as a 1-cycle-latency register file, and reduce the ports of the main register file from 12 to 4. CACTI simulation shows that the area and power consumption are reduced to 24.9% and 31.9% compared to the baseline model without register cache. Though these results are not different from the conventional systems, IPCs differ greatly. IPC of the conventional system decreases to 83.1% because of the cache miss penalties, while that of NORCS is retained at 98.0%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.