Run-time Alias Disambiguation(R TD) has been proposed as a techniquejor pointer aliasing. This paper suggests several RTD approaches which may be used for DOACROSS scheduling to exploit coarse-grained parallelism. We analyze the rerollability problem in the transformation of those RTD approaches to software pipelining in order to exploit the instruction level parallelism available in loops. Finally, we give some suggestion as to how to address the rerollability problem.
This paper traces the various explorations which have been made to demonstrate the performance improvements which can be obtained by implementing the compiling, translation and execution of higher level languages.In most instances computer architecture is based on instruction set design and much of the high level language implementation is done by software compiling into the instruction set of the machine.During the mid-1960's, several papers dealt with language performance improvements for FOR-TRAN or ALGOL machines.These discussions generally restricted themselves to a narrow problem in the entire spectrum of compilation problems.None address microprogrammed implementation of I/O functions for example.More recently we see the use of microprogram architecture for special purpose processors such as Fast Fourier Transform machines or array processors ~peed ups for basic functions such as arithmetic calculations in Fortran.With the possibility of microroutines to implement basic functions, there was a renewed interest in the Interpretive processes.This points up the need for a high level microprogramming language, We discuss some of the work done in this area. The area which has not yet been fully addressed is to what extent one can utilize microprogram architecture for basic design of an effective high level language machine.In this paper we will outline the earlier work visa vis language enhancements and then suggest she areas which must be further explored.
An oil/water, 3–D fully implicit Reservoir Simulator has been implemented on the ICL Distributed Array Processor (DAP), by a joint BNOC/ICL project team. The DAP is a single instruction, multiple data stream machine with 4096 processors in parallel. The major computational areas of code have been implemented on the DAP, leaving I/O and well calculations as serial code on the host 2976 machine. Matrix assembly is inherently parallel and can be implemented easily and efficiently on the DAP with compact code. Table look-up, though not obviously parallel, has been coded using a very efficient parallel, has been coded using a very efficient parallel algorithm. parallel algorithm. The Linear Solver, of Line Gauss-Seidel type, has been chosen for its efficiency on the DAP, with odd and even reduction used for solution of the resulting tri-diagonal equations. Further research will be undertaken to increase the robustness of this solver. The initial implementation has a 1-1 mapping of active grid blocks to processors, thus allowing up to 4096 active grid blocks. It is planned to extend to much larger models, and to 3 phase. Introduction A joint British National Oil Corporation/ International Computers Ltd project was set up in May 1980 to run until October 1982 to implement an oil reservoir simulator on an ICL Distributed Array Processor (DAP). Processor (DAP). Major objectives of the project are:to produce a reservoir simulator which would be a cost effective working tool for BNOC engineers and potentially marketable to the oil industry.to assess the capability of the DAP and associated software to perform this task and to recommend enhancements and changes to ICL.to develop expertise in parallel processing. An initial study recommended that an existing simulator, PORES be modified, with suitable sections of code being replaced by parallel DAP code. The parallel version has been called DARSI (DAp Reservoir SImulator). The initial study concluded that DARSI on the DAP could be about 3 times faster than PORES on an IBM 3033U for suitable problems (appx. 4000 blocks). Because of memory limitations, DARSI would use the less robust of the two PORES solution options and therefore might not solve all problems solved by PORES. DARSI has so far achieved 1.5 times PORES on the IBM on a 2000 block problem. The problem with the less robust solution algorithm remains. This paper describes the first phase of the project, namely the implementation of the oil/water, project, namely the implementation of the oil/water, three dimensional code. THE MACHINE DAP HARDWARE The DAP is a single instruction multiple data stream machine. It has 4096 processors (called processing elements or pe's) in a 64 × 64 square with processing elements or pe's) in a 64 × 64 square with hard-wired neighbour connections. See Figure 1. Each processing element contains a 1 bit Arithmetic Logic Unit, and 4K bits of store, making the total size of the machine 2 Megabytes. Data may be stored vertically (matrix mode) or horizontally (vector and scalar mode). See Figure 2. Instructions are broadcast to each pe by the Master Control Unit (MCU) which sequences through a set of instructions in the conventional manner. Each pe has an 'activity bit' which when unset 'switches pe has an 'activity bit' which when unset 'switches off' the pe for that particular instruction; this allows selective use of a subset of the 4096 processors. P. 523
This special section focuses primarily on high-level language concepts applicable to microprogramming. The notion of "microprogramming" is best explained in terms of more familiar levels of programming. Let us identify the configuration of the hardware system as the /0 level. Machine language programming (or "assembly language" coding) is at the/2 level. In between, at the ii level, is the specification of instructions that serves to extend the existing hardware instruction set. Instructions written at this level provide a set of language commands, used by i2-1evel programmers, that are expected to execute in time frames typically identified with the initial hardware system. Thus, microprogranuning lies somewhere between hardware and software, and is often dubbed "firmware."While we have made a great deal of progress in language designs for the various levels at which programming is done, there is insufficient discussion of optimization techniques and specification techniques for programming at the /o and il levels. One would assume that microprogrammed control and writable control stores would afford architects the opportunity to design hardware configurations that are innovative and suitable for specification techniques, such as iteration and conditional control structures, typical of high-level programming languages. But very little has actually been done in this regard. Microprogramming continues to be a convenient form of implementing traditional hardware configurations.The form in which microprogramming is carried out is also quite traditional, despite the usual arguments against bit-level coding and the horrendous nature of the postcoding deciphering task. A number of new approaches to the design of microprogramming languages are needed. For example, some of the parallelism which exists in microprogramming must be handled in ways other than the concurrency techniques presently available in higher level programming systems. In particular, microprograms must rigorously enforce controlled timing constraints. Register transfer operations are at the very heart of microprogramming, but only some hardware de-scription languages contain appropriate commands for carrying out these operations. The search continues for a language that lies somewhere between a specialized hardware description language and the familiar problem-oriented languages.New techniques are also needed for microprogram compilers. The microinstructions generated must be optimized in their use of space (i.e., to minimize the number of microinstructions) and in their associated execution times. Microinstructions typically consist of fields specifying two functions: control register transfer operations within the central processing unit, and next-address computations (toggles and flags) found within the microprogram control store. These second operations occur independently of the register transfer operations. The two papers presented in this special section deal primarily with the use and improvement of register transfer operations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.