2007 IEEE International Parallel and Distributed Processing Symposium 2007
DOI: 10.1109/ipdps.2007.370581
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Load Balancing of Unbalanced Computations Using Message Passing

Abstract: This paper examines MPI's ability to support continuous, dynamic load balancing for unbalanced parallel applications. We use an unbalanced tree search benchmark (UTS) to compare two approaches, 1) work sharing using a centralized work queue, and 2) work stealing using explicit polling to handle steal requests. Experiments indicate that in addition to a parameter defining the granularity of load balancing, message-passing paradigms require additional parameters such as polling intervals to manage runtime overhe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
45
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 55 publications
(45 citation statements)
references
References 15 publications
(13 reference statements)
0
45
0
Order By: Relevance
“…Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [1] and MPI [2]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously.…”
Section: Introductionmentioning
confidence: 78%
See 1 more Smart Citation
“…Parallel implementation of the search requires continuous dynamic load balancing to keep all processors engaged in the search. Our implementation achieves better scaling and parallel efficiency in both shared memory and distributed memory settings than previous efforts using UPC [1] and MPI [2]. We observe parallel efficiency of 80% using 1024 processors performing over 85,000 total load balancing operations per second continuously.…”
Section: Introductionmentioning
confidence: 78%
“…The contributions are streamlined termination detection, rapid diffusion of work, and an asynchronous request-response protocol for work stealing that minimizes overheads to threads performing useful work. This last contribution was inspired by an MPI implementation of UTS [2], but exploits UPC's one-sided communication operations.…”
Section: Introductionmentioning
confidence: 99%
“…Followup work refines these comparisons by considering the delays in the system [33], and different job scheduling policies [13]. More recently Dinan et al [14] compare work stealing (receiver initiated) and work sharing (sender initiated) when implemented on top of the MPI interface for message passing by using the unbalanced tree-search benchmark. These papers find that the algorithms both perform quite well-there are no clear winners-and the specifics such as the delays, the system load, and the job scheduling and preemption policies can make one preferable over the other.…”
Section: Related Workmentioning
confidence: 99%
“…For our scaling experiment we chose the Unbalanced Tree Search (UTS) application [99,100]. The benchmark contains a reference implementation using MPI in the publicly available version [101].…”
Section: Uts Case Studymentioning
confidence: 99%
“…The reference MPI implementation of the benchmark, used as the baseline for creating the HCMPI version, performed parallel search using multiple MPI processes, and load balancing using inter-process work-sharing or work-stealing algorithms. In our experiments we have focused on the work-stealing version due to better scalability [100]. We scale our experiment up to 16,384 cores on the Jaguar supercomputer.…”
Section: Uts Case Studymentioning
confidence: 99%