State-of-the-art codelet scheduling focuses on dynamic workload balance of codelets (similar to tasks). While this approach may achieve reasonable performance since computation resources are fully utilized, it may not attain optimal energy savings. In this paper, targeting at IBM Cyclops64 -a manycore system, we propose a novel polynomial time algorithm that finds out the optimal codelet scheduling in terms of maximum locality and minimum global memory accesses. Our algorithm leverages static information regarding locality among codelets to achieve better performance and energy efficiency. By using local buffers to pass data produced in one codelet to another, global memory accesses can be greatly reduced. The experimental results on our developed IBM Cyclops-64 emulator show that the codelet scheduling of our algorithm removes up to 59.7% of global memory accesses, achieves up to 68.1% of performance improvement, and reduces up to 40.7% of energy consumption comparing to the state-of-the-art codelet scheduling.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.