New solid electrolytes based on bismuth oxide

Recent advances in hardware, such as systems with multiple GPUs and their availability in the cloud, are enabling deep learning in various domains including health care, autonomous vehicles, and Internet of Things. Multi-GPU systems exhibit complex connectivity among GPUs and between GPUs and CPUs. Workload schedulers must consider hardware topology and workload communication requirements in order to allocate CPU and GPU resources for optimal execution time and improved utilization in shared cloud environments. This paper presents a new topology-aware workload placement strategy to schedule deep learning jobs on multi-GPU systems. The placement strategy is evaluated with a prototype on a Power8 machine with Tesla P100 cards, showing speedups of up to ≈1.30x compared to state-of-the-art strategies; the proposed algorithm achieves this result by allocating GPUs that satisfy workload requirements while preventing interference. Additionally, a largescale simulation shows that the proposed strategy provides higher resource utilization and performance in cloud systems.

show abstract

IBM Deep Learning Service

Bhattacharjee¹,

Boag²,

Doshi³

et al. 2017

IBM J. Res. & Dev.

View full text Add to dashboard Cite

Extreme scale computing: Modeling the impact of system noise in multicore clustered systems

Seelam

Fong

Tantawi

et al. 2010

View full text Add to dashboard Cite

Early experiences in application level I/O tracing on blue gene systems

Seelam

Chung

Hong

et al. 2008

View full text Add to dashboard Cite

Building the IBM Containers cloud service

et al. 2016

View full text Add to dashboard Cite

New Metrics for Scheduling Jobs on Cluster of Virtual Machines

Liu

Bobroff

Fong

et al. 2011

View full text Add to dashboard Cite

vPFS: Bandwidth virtualization of parallel storage systems

Arteaga

Zhao

et al. 2012

View full text Add to dashboard Cite

Abstract-Existing parallel file systems are unable to differentiate I/Os requests from concurrent applications and meet per-application bandwidth requirements. This limitation prevents applications from meeting their desired Quality of Service (QoS) as high-performance computing (HPC) systems continue to scale up. This paper presents vPFS, a new solution to address this challenge through a bandwidth virtualization layer for parallel file systems. vPFS employs user-level parallel file system proxies to interpose requests between native clients and servers and to schedule parallel I/Os from different applications based on configurable bandwidth management policies. vPFS is designed to be generic enough to support various scheduling algorithms and parallel file systems. Its utility and performance are studied with a prototype which virtualizes PVFS2, a widely used parallel file system. Enhanced proportional sharing schedulers are enabled based on the unique characteristics (parallel striped I/Os) and requirement (high throughput) of parallel storage systems. The enhancements include new threshold-and layout-driven scheduling synchronization schemes which reduce global communication overhead while delivering total-service fairness. An experimental evaluation using typical HPC benchmarks (IOR, NPB BTIO) shows that the throughput overhead of vPFS is small (< 3% for write, < 1% for read). It also shows that vPFS can achieve good proportional bandwidth sharing (> 96% of target sharing ratio) for competing applications with diverse I/O patterns.

show abstract

Fairness and Performance Isolation: an Analysis of Disk Scheduling Algorithms

Seelam

Teller

2006

View full text Add to dashboard Cite

An I/O system using a sharing model provides concurrently executing applications shared access to the underlying I/O resources. Although the existing sharing models of these I/O systems are purported to be fair, none of them results in performance isolation. Failing to provide performance isolation results in unpredictable application performance. Unpredictability in application performance hampers providing quality of service guarantees. In this paper, we present a formal analysis of the fairness properties of various disk scheduling algorithms and an experimental evaluation of their performance isolation properties. We show that none of the existing "fair" scheduling algorithms provides performance isolation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Seetharami Seelam

Topology-aware GPU scheduling for learning workloads in cloud environments

IBM Deep Learning Service

Extreme scale computing: Modeling the impact of system noise in multicore clustered systems

Early experiences in application level I/O tracing on blue gene systems

Building the IBM Containers cloud service

New Metrics for Scheduling Jobs on Cluster of Virtual Machines

vPFS: Bandwidth virtualization of parallel storage systems

Fairness and Performance Isolation: an Analysis of Disk Scheduling Algorithms

Contact Info

Product

Resources

About