Pak Markthub scite author profile

In heterogeneous supercomputers such as TSU-BAME2.5, GPUs on some nodes in GPU batch queues are left idle even though there are jobs waiting in the queues; this is caused by GPU resource-assignment fragmentation problem. For example, in the case that each node has three GPUs like TSUBAME2.5's, if a node has already been assigned to a job requesting two GPUs per node, that node cannot be assigned to another job requesting more than one GPU per node until the ongoing job finishes; hence, one GPU is left idle on that node. We examine this problem on TSUBAME2.5's GPU batch-queue system and present a scheduling algorithm that assigns rCUDA (a remote CUDA execution technology) to some processes of some jobs. Because rCUDA allows jobs to utilize the idle GPUs, the proposed scheduling algorithm can alleviate the problem. Using a job pattern obtained from a scheduler log of a TSUBAME2.5's GPU queue, our simulation shows that the proposed algorithm can decrease jobs' lifetime (from the time when a job arrives until finishes) by about 5% on average. Moreover, it can reduce the average number of idle GPUs by about 15%. Also, even reducing the number of nodes serving jobs by around 4%, the proposed algorithm can maintain the average jobs' lifetime around the same as the scheduling algorithm currently used in the TSUBAME2.5's GPU queue.

show abstract

Serving More GPU Jobs, with Low Penalty, Using Remote GPU Execution and Migration

Markthub

Nomura

Matsuoka

2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Pak Markthub

DRAGON: Breaking GPU Memory Capacity Limits with Direct NVM Access

Using rCUDA to Reduce GPU Resource-Assignment Fragmentation Caused by Job Scheduler

Serving More GPU Jobs, with Low Penalty, Using Remote GPU Execution and Migration

Contact Info

Product

Resources

About