Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments

Vu, Trong-Tuan; Derbel, Bilel

doi:10.1016/j.future.2015.10.009

Cited by 29 publications

(29 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The PFSP has been frequently used as a test-case for parallel B&B algorithms, as the huge amount of generated nodes and the highly irregular structure of the search tree raise multiple challenges in terms of design and implementation on increasingly complex parallel architectures, e. g. grid computing (Mezmaz et al, 2007;Drozdowski et al, 2011;Bendjoudi et al, 2012), multicore CPUs (Mezmaz et al, 2014a;Gmys et al, 2016a), GPUs and many-core devices (Chakroun et al, 2013;Gmys et al, 2016b;Melab et al, 2018), clusters of GPUs (Vu and Derbel, 2016) or FPGAs (Daouri et al, 2015).…”

Section: Parallelismmentioning

confidence: 99%

A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem

Gmys

Mezmaz

Melab

et al. 2020

European Journal of Operational Research

View full text Add to dashboard Cite

In this work we propose an efficient branch-and-bound (B&B) algorithm for the permutation flowshop problem (PFSP) with makespan objective. We present a new node decomposition scheme that combines dynamic branching and lower bound refinement strategies in a computationally efficient way. To alleviate the computational burden of the two-machine bound used in the refinement stage, we propose an online learning-inspired mechanism to predict promising couples of bottleneck machines. The algorithm offers multiple choices for branching and bounding operators and can explore the search tree either sequentially or in parallel on multi-core CPUs. In order to empirically determine the most efficient combination of these components, a series of computational experiments with 600 benchmark instances is performed. A main insight is that the problem size, as well as interactions between branching and bounding operators substantially modify the trade-off between the computational requirements of a lower bound and the achieved tree size reduction. Moreover, we demonstrate that parallel tree search is a key ingredient for the resolution of large problem instances, as strong super-linear speedups can be observed. An overall evaluation using two well-known benchmarks indicates that the proposed approach is superior to previously published B&B algorithms. For the first benchmark we report the exact resolution -within less than 20 minutes -of two instances defined by 500 jobs and 20 machines that remained open for more than 25 years, and for the second a total of 89 improved best-known upper bounds, including proofs of optimality for 74 of them. . In contrast, exact methods allow to find optimal solution(s) with a proof of optimality, but their execution time is unpredictable and exponential in the worst-case.Branch-and-Bound (B&B) is the most frequently used exact method to solve combinatorial optimization problems like the PFSP. The algorithm recursively decomposes the initial problem by dynamically constructing and exploring a search-tree, whose root node represents the initial problem, leaf nodes are possible solutions and internal nodes are subproblems of the initial problem. This is done using four operators: branching, bounding, selection and pruning. The branching operator divides the initial problem into smaller disjoint subproblems and a bounding function computes lower bounds on the optimal cost of a subproblem. The pruning operator eliminates subproblems whose lower bound exceeds the cost of the best solution found so far (upper bound on the optimal makespan). The tree-traversal is guided by the selection operator which returns the next subproblem to be processed according to a search strategy (e.g. depth-first search).In this paper the focus is put on three performance-critical components of the algorithm: the lower bound (LB), the branching rule and the use of parallel tree exploration. Although they can be separated on a conceptual level, the main objective of this article is to reveal interactions between these compone...

show abstract

Section: Parallelismmentioning

confidence: 99%

A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem

Gmys

Mezmaz

Melab

et al. 2020

European Journal of Operational Research

View full text Add to dashboard Cite

show abstract

“…-Parallelization strategies can be combined to exploit complementary ways of parallelizations. For example, low-level and domain decomposition parallelism have been jointly applied to branch-and-X algorithms [Vu andDerbel, 2016, Adel et al, 2016] and to dynamic programming [Maleki et al, 2016], and low-level and multi-search parallelism to genetic algorithms [Abbasian andMouhoub, 2013, Munawar et al, 2009]. In total, we found eight studies which apply such combinations.…”

Section: Algorithmic Parallelization and Computational Parallelizationmentioning

confidence: 99%

“…Finally, it should be noticed that parallelization strategies are not mutually incompatible and may be combined into comprehensive algorithmic designs [Crainic et al, 2006, Crainic, 2019. For example, low-level and decomposition parallelism have been jointly applied to branch-and-bound [Adel et al, 2016] and dynamic programming [Vu and Derbel, 2016], [Maleki et al, 2016], and low-level parallelism and cooperative multi-search have been applied to a hybrid metaheuristic [Munawar et al, 2009] which uses a genetic algorithm and hill climbing.While the aforementioned parallelization strategies have been formulated for the class of metaheuristics, the strategydefining principles are of general nature of parallelizing optimization algorithms so that the scope of applicability of the parallelization strategies can be straightforward extended to other algorithm classes, including exact methods and (problem-specific) heuristics. For example, Gendron and Crainic [1994] have defined three types of parallelism for branch-and-bound: their type 1 parallelism refers to parallelism when performing operations on generated subproblems, such as executing the bounding operation in parallel for each subproblem.…”

mentioning

confidence: 99%

Parallel computational optimization in operations research: A new integrative framework, literature review and research directions

Schryen

2020

European Journal of Operational Research

View full text Add to dashboard Cite

Solving optimization problems with parallel algorithms has a long tradition in OR. Its future relevance for solving hard optimization problems in many fields, including finance, logistics, production and design, is leveraged through the increasing availability of powerful computing capabilities. Acknowledging the existence of several literature reviews on parallel optimization, we did not find reviews that cover the most recent literature on the parallelization of both exact and (meta)heuristic methods. However, in the past decade substantial advancements in parallel computing capabilities have been achieved and used by OR scholars so that an overview of modern parallel optimization in OR that accounts for these advancements is beneficial. Another issue from previous reviews results from their adoption of different foci so that concepts used to describe and structure prior literature differ. This heterogeneity is accompanied by a lack of unifying frameworks for parallel optimization across methodologies, application fields and problems, and it has finally led to an overall fragmented picture of what has been achieved and still needs to be done in parallel optimization in OR. This review addresses the aforementioned issues with three contributions: First, we suggest a new integrative framework of parallel computational optimization across optimization problems, algorithms and application domains. The framework integrates the perspectives of algorithmic design and computational implementation of parallel optimization. Second, we apply the framework to synthesize prior research on parallel optimization in OR, focusing on computational studies published in the period 2008-2017. Finally, we suggest research directions for parallel optimization in OR.Keywords computing science · parallel optimization · computational optimization · literature review * I am grateful for the support provided by Abdullah Burak, Philip Empl, Constanze Hilmer, Gerhard Rauchecker, Richard Schuster, Henning Siemes, and Melih Yilmaz, who supported me substantially in searching and coding research articles.2 Impressive computational results of applying parallelization to the traveling salesman problem (TSP) are reported by Crainic et al. [2006, p.2]. arXiv:1910.03028v1 [cs.DC] 3 Oct 2019Parallel computational optimization in operations research is challenging in general from both the algorithmic and the computational perspective, and ii) a viable alternative to parallelizing algorithms has been the exploitation of ongoing increases of clock speed of single CPUs of modern microprocessors. But this growth process reached a limit already several years ago due to heat dissipation and energy consumption issues . This development makes parallelization efforts (not only in optimization) much more important than it was in earlier times.Fortunately, the need for parallelization has been acknowledged and accompanied by an increased availability of parallel computing resources. This availability is rooted in two phenomena: a) the rapid development of para...

show abstract

“…The main feature of multicore chip is that tremendous increase in performance by increasing the number of cores instead of increasing the frequency [3]. To improve the multicore CPU performance, three factors namely parallelism granularity, incorrect programming model and language compilers to be tuned [10]. Nowadays, most of the embedded applications are parallel processing application.…”

mentioning

confidence: 99%

An Application based Efficient Thread Level Parallelism Scheme on Heterogeneous Multicore Embedded System for Real Time Image Processing.

Indragandhi

Jawahar

2020

SCPE

View full text Add to dashboard Cite

The recent advent of the embedded devices is equipped with multicore processor as it significantly improves the system performance. In order to utilize all the core in multicore processor in an efficient manner, application programs need to be parallelized. An efficient thread level parallelism (ETLP) scheme is proposed in this paper and uses computationally intensive edge detection algorithm for evaluation. Edge detection is the important process in various real time applications namely vehicle detection in traffic control, medical image processing etc. The main objective of ETLP scheme is to reduce the execution time and increase the CPU core utilization. The performance of ETLP scheme is evaluated with basic edge detection scheme (BEDS) for different image size. The experimental results reveal that the proposed ETLP scheme achieves efficiency of 49% and 72% for the image size 300 × 256 and 1024 × 1024 respectively. Furthermore an ETLP scheme reducing 66% execution time for image size 1024 × 1024 when compared with BEDS.

show abstract

Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments

Cited by 29 publications

References 45 publications

A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem

A computationally efficient Branch-and-Bound algorithm for the permutation flow-shop scheduling problem

Parallel computational optimization in operations research: A new integrative framework, literature review and research directions

An Application based Efficient Thread Level Parallelism Scheme on Heterogeneous Multicore Embedded System for Real Time Image Processing.

Contact Info

Product

Resources

About