GPU‐based branch‐and‐bound method to solve large 0‐1 knapsack problems with data‐centric strategies

Shen, Jingcheng; Shigeoka, Kentaro; Ino, Fumihiko; Hagihara, Kenichi

doi:10.1002/cpe.4954

Cited by 18 publications

(17 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Future work is to deal with large problem sizes that cannot be naively stored in the GPU memory due to memory exhaustion. An out-of-core processing scheme [18], [19] may be useful for realizing this large-scale computation; the scheme divides data into small pieces and iteratively processes the pieces with overlapping GPU computation with CPU-GPU data transfer.…”

Section: Resultsmentioning

confidence: 99%

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Kimura

Higa

Okita

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

In this paper, we propose an acceleration method for the Held-Karp algorithm that solves the symmetric traveling salesman problem by dynamic programming. The proposed method achieves acceleration with two techniques. First, we locate data-independent subproblems so that the subproblems can be solved in parallel. Second, we reduce the number of subproblems by a meet in the middle (MITM) technique, which computes the optimal path from both clockwise and counterclockwise directions. We show theoretical analysis on the impact of MITM in terms of the time and space complexities. In experiments, we compared the proposed method with a previous method running on a single-core CPU. Experimental results show that the proposed method on an 8-core CPU was 9.5-10.5 times faster than the previous method on a single-core CPU. Moreover, the proposed method on a graphics processing unit (GPU) was 30-40 times faster than that on an 8-core CPU. As a side effect, the proposed method reduced the memory usage by 48%. key words: symmetric traveling salesman problem, Held-Karp algorithm, parallelization, meet in the middle, GPU

show abstract

Section: Resultsmentioning

confidence: 99%

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Kimura

Higa

Okita

et al. 2019

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Therefore, this challenge has led to the need to develop algorithms that can produce near-optimal solutions in a reasonable amount of time [12]. During the years, different solution approaches have been developed including exact algorithms (such as branch-and-bound [13,14] and branch-and-cut [15]), heuristic algorithms (such as the Clarke-Wright savings algorithm [16]), and metaheuristic algorithms (such as simulated annealing [17,18], genetic algorithms [19], tabu search [20], and ant algorithms [21]. Earlier, conventional heuristic algorithms were designed as a response to limited computer processing power.…”

Section: Introductionmentioning

confidence: 99%

Novel Route Planning System for Machinery Selection. Case: Slurry Application

2020

View full text Add to dashboard Cite

The problem of finding an optimal solution for the slurry application process is casted as a capacitated vehicle routing problem (CVRP) in which by considering the vehicle’s capacity, it is required to visit all the tracks only once to fully cover the field, as well as complying with a specified targeted application rate. A key objective in this study was to determine an optimized coverage plan in order to minimize the driving distance in the field, while at the same time allowing for varying the application rate. The coverage plan includes the optimal sequence of tracks with a specified application rate for each track. Two algorithms were developed for optimization and simulation of the slurry application cast as capacitated operations. In order to validate the proposed algorithms, a slurry application operation was recorded, and the results of the optimization algorithm were compared with the conventional non-optimized method. The comparison showed that applying the proposed new method reduces the non-working distance by 18.6% and the non-working time by 28.1%.

show abstract

“…Currently, the graphics processing unit (GPU) is considered to be the most efficient architecture for parallel stencil code [7]. Armed with thousands of cores and 5-10 times higher memory bandwidth than CPUs, GPUs provide powerful solutions for both compute-and memory-intensive scientific prob-lems [8]- [11]. However, there are two main challenges in implementing GPU-accelerated stencil code: limited capacity of device (i.e., GPU) memory and considerable programming effort to implement GPU-accelerated code.…”

Section: Introductionmentioning

confidence: 99%

A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU

Shen

Ino

Farrés

et al. 2020

IEICE Trans. Inf. & Syst.

Self Cite

View full text Add to dashboard Cite

Graphics processing units (GPUs) are highly efficient architectures for parallel stencil code; however, the small device (i.e., GPU) memory capacity (several tens of GBs) necessitates the use of out-of-core computation to process excess data. Great programming effort is needed to manually implement efficient out-of-core stencil code. To relieve such programming burdens, directive-based frameworks emerged, such as the pipelined accelerator (PACC); however, they usually lack specific optimizations to reduce data transfer. In this paper, we extend PACC with two data-centric optimizations to address data transfer problems. The first is a direct-mapping scheme that eliminates host (i.e., CPU) buffers, which intermediate between the original data and device buffers. The second is a region-sharing scheme that significantly reduces host-to-device data transfer. The extended PACC was applied to an acoustic wave propagator, automatically extending the length of original serial code 2.3-fold to obtain the out-of-core code. Experimental results revealed that on a Tesla V100 GPU, the generated code ran 41.0, 22.1, and 3.6 times as fast as implementations based on Open Multi-Processing (OpenMP), Unified Memory, and the previous PACC, respectively. The generated code also demonstrated usefulness with small datasets that fit in the device capacity, running 1.3 times as fast as an in-core implementation.

show abstract

GPU‐based branch‐and‐bound method to solve large 0‐1 knapsack problems with data‐centric strategies

Cited by 18 publications

References 31 publications

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Accelerating the Held-Karp Algorithm for the Symmetric Traveling Salesman Problem

Novel Route Planning System for Machinery Selection. Case: Slurry Application

A Data-Centric Directive-Based Framework to Accelerate Out-of-Core Stencil Computation on a GPU

Contact Info

Product

Resources

About