We describe atask model and dynamic scheduling and resource allocation mechanism for synchronous parallel tasks to be executed on SPMD-programmed synchronous shared-memory MIMD parallel architectures with uniform, unit-time memory access and strict memory consistency, also known in the literature as PRAMs (Parallel Random Access Machines).Our task model provides atwo-tier programming model for PRAMs that flexibly combines SPMD and fork-join parallelism within the same application. It offers flexibility by dynamic scheduling and late resource binding while preserving the PRAM execution properties within each task, the only limitation being that the maximum number of threads that can be assigned to at ask is limited to what the underlying architecture provides. In particular,o ur approach opens for automatic performance tuning at run-time by controlling the thread allocation for tasks based on run-time predictions.By aprototype implementation of asynchronous parallel task API in the SPMDbased PRAM language Fork and experimental evaluation with example programs on the SBPRAM simulator,w es howt hat ar ealization of the task model on aS PMDprogrammable PRAM machine is feasible with moderate runtime overhead per task.
1I ntroductionDuring the recent years, computer architectures available on the consumer market have switched from single-core architectures to multi-cores, and it is reasonable to assume that we enter the many-core era in the near future. The reason for this change is that hardware manufacturers try to keep up with the demand of more computation power and at the same time consume less energy.A saconsequence, speed-up of legacy,s ingle-threaded computer programs does not come for free anym ore butr equires rewriting to leverage manycores. Even worse is that, even where providing ashared memory abstraction, these newa rchitectures mainly followN UMA and SMP designs that lack features that could ease parallel programming, such as strong memory consistencyordeterministic execution.To ease the burden for both application programmers and compiler engineers, some architecture projects [PBB + 02, For10, WV08] are working towards supporting more powerful, deterministic parallel programming models such as the PRAM model [FW78,KKT01]. The PRAM model is often considered as only at heoretical programming model, buta lready in the 1990s it has been realized in hardware, albeit not on asingle chip, e.g. the SB-PRAM [PBB + 02, KKT01]. In acurrent project by VTT Oulu (Finland) anew architecture 517