Datarol-II, which is tolerable to the latencies caused
AbstractWe discuss a design principle of massively parallel distributed-memory multiprocessor architecture which solves latency problem, and present the Datarol machine architecture. Latencies, caused by remote memory access and remote procedure call, are most serious problems in massively parallel computers. In order to eliminate the processor idle times caused by these latencies, processors must perform fast context switching among fine-grain concurrent processes.First, we present a processor architecture, called Datarol-11, that promotes efficient fine-grain multithread execution by performing fast context switching among fine-grain concurrent processes. In the Datarol-II processor, an implicit register load/store mechanism is embedded in the execution pipeline in order t o reduce memory access overhead caused by context switching. In order to reduce local memory access latency, a two-level hierarchical memory system and a load control mechanism are also introduced.Then, we present a cost-effective design of the Da t arol-I1 processor, which incorporates off-the-shelf high-end microprocessor while preserving the finegrain dataflow concept. The off-the-shelf micropre cessor Pentium is used for its core processing, and a co-processor called FMP (Fine-grain Message Processor) is designed for fine grained message handling and communication controls. The co-processor FMP is desi ned on the basis of FMD (Fine-grain Message Driven? execution model, in which fine-grain multithreaded execution is driven and controlled by simple fine-grain message communications.