“…The list of nodes is generated at compile time and the cost of running tasks on every PE is known in advance. Some relevant works on this area are [20,11,14,15]. However, communication cost in heterogeneous multicore systems is several orders of magnitude smaller than in distributed systems.…”
Abstract. Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every recent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-specific applications like scientific applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous computing systems where all their heterogeneous resources are continuously utilized by different applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power consumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component. In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed several scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple applications to fully utilize all available processing resources in CPU/GPUlike systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.
“…The list of nodes is generated at compile time and the cost of running tasks on every PE is known in advance. Some relevant works on this area are [20,11,14,15]. However, communication cost in heterogeneous multicore systems is several orders of magnitude smaller than in distributed systems.…”
Abstract. Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every recent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-specific applications like scientific applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous computing systems where all their heterogeneous resources are continuously utilized by different applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power consumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component. In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed several scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple applications to fully utilize all available processing resources in CPU/GPUlike systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.
“…Stone's model [44] is formulated to find an optimal assignment of program modules onto a two-processor distributed computer system to minimize the cost of intermodule reference and running which can be represented as follows:…”
S S S t t t o o o n n n y y y B B B r r r o o o o o o k k k U U U n n n i i i v v v e e e r r r s s s i i i t t t y y yThe official electronic file of this thesis or dissertation is maintained by the University Libraries on behalf of The Graduate School at Stony Brook University.
Stony Brook University
2008This thesis focuses on techniques of task mapping for solving problems on parallel computers with hundreds of thousands of processors on cellular networks. Task mapping is a serious intellectual challenge and a practical tool for unleashing the potential power of supercomputers. It is challenging because of both the astronomical searching space and the high dependence on the exact nature of the applications and the computers. In this thesis, we propose two general static mapping models to optimize the assignment of tasks on heterogeneous, distributed-memory, ultra-scalable computers. In our models, the underlying application problems can be appropriately decomposed to subtasks with known computational load and known inter-task communiiii cational demands. We also know, or can conveniently measure, the computing systems' specifications such as individual processor speed and inter-processor communication cost. Our models abstract an application as a demand matrix and a parallel computer as a load matrix and a supply matrix with which we construct our models as minimizing the objective function value for completing the application on the given computer.We have tested several applications on Blue Gene/L supercomputer with 3D mesh and torus networks. For a 2D wave equation, the mappings generated by our models reduced communication by 51% for 3D-mesh and 31% for 3D-torus over the default MPI rank order mapping. For SMG2000 application, our mapping can reduce communication and total time by 16% and 5% over the default MPI rank order mapping, respectively. For NPB MG, we improve the communication time and benchmark result by 53% and 13%, respectively.For NPB CG, we improve the communication time and benchmark result by 43% and 22%, respectively. We believe that our models are useful for task assignment for broad applications on a family of supercomputers with cellular networks.iv
“…There are several theoretical analysis of the task assignment problem. Some approaches consider a graph formed by system nodes together with tasks as vertices and communication costs together with execution costs as edges without considering a multi-hop network topology ( [2,3,4]). Other research deals with multi-hop networks with a complex topology ( [5,6,7,8]).…”
Abstract. This paper presents a basic and an extended heuristic to distribute operating system (OS) services over mobile ad hoc networks. The heuristics are inspired by the foraging behavior of ants and are used within our NanoOS, an OS for distributed applications. The NanoOS offers an uniform environment of execution and the code of the OS is distributed among nodes. We propose a basic and an extended swarm optimization based heuristic to control the service migration in order to reduce the communication overhead. In the basic one, each service request leaves pheromone in the nodes on its path to the service provider (like ants leave pheromone when foraging). An optimization step occurs when the service provider migrates to the neighbor node with the higher pheromone concentration. The proposed extension takes into account the position of the node in the network and its energy. Realized simulations have shown that the basic heuristic performs well. The total communication cost in average is just 40% higher than the global optimum. In addition, both heuristics have a low computational requirement.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.