This paper presents results of evaluating the communications capabilities of the generalized hypercube interconnection network. The generalized hypercube has outstanding topological properties, but it has not been implemented on a large scale because of its very high wiring complexity. For this reason, this network has not been studied extensively in the past. However, recent and expected technological advancements will soon render this network viable for massively parallel systems. We first present implementations of randomized manyto-all broadcasting and multicasting on generalized hypercubes, using as the basis the oneto-all broadcast algorithm presented by Fragopoulou et al. (1996). We test the proposed implementations under realistic communication traffic patterns and message generations, for the all-port model of communication. Our results show that the size of the intermediate message buffers has a significant effect on the total communication time, and this effect becomes very dramatic for large systems with large numbers of dimensions. We also propose a modification of this multicast algorithm that applies congestion control to improve its performance. The results illustrate a significant improvement in the total execution time and a reduction in the number of message contentions, and also prove that the generalized hypercube is a very versatile interconnection network.
Most microprocessors introduced into the market in the past few years employ pipelining to enhance execution speed. Moreover, many of these processors use multiple pipelined functional units. This paper surveys several heuristics reported in the literature on the topic of code optimization and reordering for exploiting instruction level parallelism in pipelined processors. Five methods are described in detail and several others are briefly reviewed
A compiler technique to utilize instruction level parallelism is presented in this paper. The software pipelining algorithm presented concentrates on innermost looops with array accesses dominating variable references. Dependence graphs labeled with either direction or distance information are provided as input to the pipelining algorithm. In the first step of the algorithm, a loop body of minimal schedule length is generated for a machine with infinite resources. This schedule is mapped onto a processor with finite resources in the next step. This division into two steps enables us to make use of existing algorithms for DAG scheduling to handle loop scheduling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.