The dataflow model of processing, in general, and recent direction to combine dataflow processing with controlflow processing, in particular, provide attractive alternatives to satisfy the computational demand of new applications, without experiencing the shortcomings of the traditional concurrent systems. This should motivate researchers to analyze the applicability of the familiar concepts within this new architectural framework-Scheduling and load balancing.Run-time overhead of detection and allocation of dynamic parallelism in a program can easily, offset the performance gain. However, the difficult task of accurate estimation of the run-time parallelism during the compile-time is a stumbling block to the static approach. As a compromise, we propose an allocation policy which detects dynamic parallelism for a selected group of program constructs during compile-time and allocates them to the estimated hardware resources in a staggered fashion. The proposed staggered scheme is simulated and its performance is compared against some other schemes proposed in the literature. It has been shown that the proposed scheme offers order of magnitude performance improvement over the cyclic distribution.
Within the scope of the multithreaded dataflow, the problem of scheduling/allocation of DOACROSS loops has been discussed and it was shown that the so-called staggered allocation offers higher performance and resource utilization than other schemes described in the literature. The staggered scheme, however, produces an unbalanced load among processors. This paper introduces an extension to the staggered scheme-cyclic staggered scheme-that produces a more balanced distribution of iterations among processors. The cyclic staggered scheme is simulated and its performance improvement is analyzed.
It has been shown that in many instances the runtime overhead of detection and allocation of dynamic parallelism in a program can easily offset the performance gain. Therefore, to improve performance and reduce run-time overhead, it would be logical to devise an allocation scheme which detects dynamic parallelism during compile-time. However, the difficult task of accurate estimation of the run-time parallelism is a stumbling block to this direction.As a compromise, we propose an allocation policy which: i) detects dynamic parallelism for loop constructs during compile-time and, ii) allocates them to the estimated hardware resources in a staggered fashion using a set of heuristic rules. This paper introduces the proposed Staggered Distribution Scherrie and addresses its simulation and performance improvement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.