The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC, balanced with the security and resilience challenges. LEGaTO is an ongoing three-year EU H2020 project started in December 2017.
Summary
Binding parallel tasks to cores according to a placement policy is one of the key aspects to achieve good performance in multicore machines because it can reduce on‐chip communication among parallel threads. Binding also prevents operating system from migrating threads, which improves data locality. However, there is no single mapping policy that works best among all different kinds of applications and platforms because each machine has a different topology and each application exhibits different communication pattern. Determining the best policy for a given application and machine requires extra programming effort. To relieve the programmer from that burden, we introduce BindMe, a thread binding library that assists programmer to bind threads to underlying hardware. BindMe incorporates state‐of‐the‐art mapping algorithms, which use communication pattern in an application to formulate an efficient task placement policy. We also introduce ChoiceMap, a communication aware mapping algorithm that respects mutual priorities of parallel tasks and performs a fair mapping by reducing communication volume among cores. We have tested BindMe and ChoiceMap with various applications from NAS parallel benchmark and Rodinia bechmark. Our results show that choosing a mapping policy that best suits the application behavior can increase its performance and no single policy gives the best performance across different applications.
The LEGaTO project leverages task-based programming models to provide a software ecosystem for Made in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC, balanced with the security and resilience challenges. LEGaTO is an ongoing three-year EU H2020 project started in December 2017.
Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and explicit knowledge on task criticality to handle scenarios in which system performance is not only unknown but also changing over time. Our proposed task scheduler dynamically learns the performance characteristics of the underlying platform and uses this knowledge to devise better schedules aware of dynamic performance asymmetry, hence reducing the impact of interference. Our evaluation shows that both criticality-aware scheduling and parallelism tuning are effective schemes to address interference in both shared and distributed memory applications.CCS Concepts • Computer systems organization → Multicore architectures.
Chiplets have become a common methodology in modern chip design. Chiplets improve yield and enable heterogeneity at the level of cores, memory subsystem and the interconnect. Convolutional Neural Networks (CNNs) have high computational, bandwidth and memory capacity requirements owing to the increasingly large amount of weights. Thus to exploit chiplet-based architectures, CNNs must be optimized in terms of scheduling and workload distribution among computing resources. We propose Shisha, an online approach to generate and schedule parallel CNN pipelines on chiplet architectures. Shisha targets heterogeneity in compute performance and memory bandwidth and tunes the pipeline schedule through a fast online exploration technique. We compare Shisha with Simulated Annealing, Hill Climbing and Pipe-Search. On average, the convergence time is improved by ∼ 35× in Shisha compared to other exploration algorithms. Despite the quick exploration, Shisha's solution is often better than that of other heuristic exploration algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.