Parallel Branch and Bound on a CPU-GPU System

Boukedjar, Abdelamine; Lalami, Mohamed Esseghir; Baz, Didier El

doi:10.1109/pdp.2012.23

Cited by 22 publications

(32 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We shall see in Section IV that the improvements we propose in this paper have permitted us to increase substantially speedup as compared with the one obtained in our previous work (see [13]). …”

Section: Implementation On a Cpu-gpu Systemmentioning

confidence: 60%

“…The speedup obtained with this implementation is generally twice as much as the one in [13], where noncoalesced global memory accesses may occur in conditional part of codes, leading to poor efficiency.…”

Section: Computational Resultsmentioning

confidence: 94%

“…However, only operations with registers are included in the conditional part of this new version of our code unlike operations of writing in global memory that were used in our previous work. This permits us in particular to be more efficient than with previous kernel proposed in [13]. …”

Section: Branchingmentioning

confidence: 99%

“…In [13], we have proposed a first parallel implementation of the branch and bound algorithm on a CPU-GPU system via CUDA. Experiments carried out on a system with a 3 Ghz Xeon Quadro INTEL processor and a Tesla C2050 GPU have shown a speedup of 9 as compared with results obtained on a single core of the CPU.…”

Section: Introduction and Related Workmentioning

confidence: 99%

See 3 more Smart Citations

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Lalami

Baz

2012

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &Amp; PhD Forum

Self Cite

View full text Add to dashboard Cite

Abstract-In this paper, we propose an efficient implementation of the branch and bound method for knapsack problems on a CPU-GPU system via CUDA. Branch and bound computations can be carried out either on the CPU or on a GPU according to the size of the branch and bound list. A better management of GPUs memories, less GPU-CPU communications and better synchronization between GPU threads are proposed in this new implementation in order to increase efficiency. Indeed, a series of computational results is displayed and analyzed showing a substantial speedup on a Tesla C2050 GPU.

show abstract

Section: Implementation On a Cpu-gpu Systemmentioning

confidence: 60%

Section: Computational Resultsmentioning

confidence: 94%

Section: Branchingmentioning

confidence: 99%

Section: Introduction and Related Workmentioning

confidence: 99%

See 2 more Smart Citations

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Lalami

Baz

2012

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops &Amp; PhD Forum

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, because of the irregularity of the B&B tree search, scheduling the load inside a GPU device is not fully compatible with the underlying SIMD programming model. In [10,16,39,7,34,11], implementations on a single GPU device are presented with respect to specific optimization problems, e.g., Flowshop, Knapsack and TSP (Traveling salesman). With respect to this paper, these studies are rather of limited interest since they are problem specific and do not consider a heterogeneous and large scale setting.…”

Section: Parallel Bandb With Gpusmentioning

confidence: 99%

Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments

Derbel

2016

Future Generation Computer Systems

View full text Add to dashboard Cite

We investigate the design of parallel B&B in large scale heterogeneous compute environments where processing units can be composed of a mixture of multiple shared memory cores, multiple distributed CPUs and multiple GPUs devices. We describe two approaches addressing the critical issue of how to map B&B workload with the different levels of parallelism exposed by the target compute platform. We also contribute a throughout large scale experimental study which allows us to derive a comprehensive and fair analysis of the proposed approaches under different system configurations using up to 16 GPUs and up to 512 distributed cores. Our results shed more light on the main challenges one has to face when tackling B&B algorithms while describing efficient techniques to address them. In particular, we are able to obtain linear speed-ups at moderate scales where adaptive load balancing among the heterogeneous compute resources is shown to have a significant impact on performance. At the largest scales, intranode parallelism and hybrid decentralized load balancing is shown to have a crucial importance in order to alleviate locking issues among shared memory threads and to scale the distributed resources while optimizing communication costs and minimizing idle times.

show abstract

GPU‐based branch‐and‐bound method to solve large 0‐1 knapsack problems with data‐centric strategies

Shen

Shigeoka

Ino

et al. 2018

Concurrency and Computation

View full text Add to dashboard Cite

An out-of-core branch-and-bound (B&B) method to solve large 0-1 knapsack problems on a graphics processing unit (GPU) is proposed. Given a large problem that produces many subproblems, the proposed method dynamically swaps subproblems to CPU memory. Because such a CPU-centric subproblem management scheme increases CPU-GPU data transfer, we adopt three data-centric strategies to eliminate this side effect. The first is an out-of-order search (O3S) strategy that reduces the data transfer overhead by adaptively transferring subproblems between the CPU and GPU. The second is an explicitly-managed pipelining strategy that hides the data transfer overhead by overlapping data transfer with GPU-based B&B operations. The third is a GPU-based stream compaction strategy that reduces the sparseness of arrays to be transferred. Experimental results demonstrate that the proposed out-of-core method stored 41 times as many subproblems as a previous in-core method that manages subproblems in GPU memory, solving approximately twice as many problem instances on the GPU. In addition, compared to a previous breadth-first search (BFS) strategy, the proposed O3S strategy achieved an average speedup of 7.5 times. KEYWORDSbranch and bound, data-centric method, GPU, knapsack, out-of-core computation Concurrency Computat Pract Exper. 2019;31:e4954.wileyonlinelibrary.com/journal/cpe FIGURE 3 Array-based subproblem management scheme. Pruned subproblems (ie, passive; X) make the array sparse as the depth of the search tree increases. Note that the branching, bounding, and pruning operations are performed on the GPU SUPPORTING INFORMATIONAdditional supporting information may be found online in the Supporting Information section at the end of the article.How to cite this article: Shen J, Shigeoka K, Ino F, Hagihara K. GPU-based branch-and-bound method to solve large 0-1 knapsack problems with data-centric strategies. Concurrency Computat Pract Exper. 2019;31:e4954. https://doi.

show abstract

Parallel Branch and Bound on a CPU-GPU System

Cited by 22 publications

References 14 publications

GPU Implementation of the Branch and Bound Method for Knapsack Problems

GPU Implementation of the Branch and Bound Method for Knapsack Problems

Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments

GPU‐based branch‐and‐bound method to solve large 0‐1 knapsack problems with data‐centric strategies

Contact Info

Product

Resources

About