2017 23rd International Conference on Automation and Computing (ICAC) 2017
DOI: 10.23919/iconac.2017.8082085
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic load balancing on multi-GPUs system for big data processing

Abstract: Abstract:The powerful parallel computing capability of modern GPU (Graphics Processing Unit) processors has attracted increasing attentions of researchers and engineers who had conducted a large number of GPU-based acceleration research projects. However, current single GPU based solutions are still incapable of fulfilling the real-time computational requirements from the latest big data applications. Thus, the multi-GPU solution has become a trend for many real-time application attempts. In those cases, the c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…1 auto kernel = file_read("binomial.cl"); 2 auto samples = 16777216; auto steps = 254; 3 auto steps1 = steps + 1; auto lws = steps1; 4 auto samplesBy4 = samples / 4; 5 auto gws = lws * samplesBy4; 6 vector<cl_float4> in(samplesBy4); 7 vector<cl_float4> out(samplesBy4); 8 9 binomial_init_setup(samplesBy4, in, out); 18 program.in(in); 19 program.out(out); 20 21 program.out_pattern(1, lws); 22 23 program.kernel(kernel, "binomial_opts"); 24 program.arg(0, steps); // positional by index 25 program.arg(in); // aggregate 26 program.arg(out); 27 program.arg(steps1 * sizeof(cl_float4), 28 ecl::Arg::LocalAlloc); 29 program.arg(4, steps * sizeof(cl_float4), 30 ecl::Arg::LocalAlloc); 31 32 engine.use(std::move(program)); 33 34 engine.run(); 35 36 // if (engine.has_errors()) // [Optional lines] 37 // for (auto& err : engine.get_errors()) 38 // show or process errors Listing 1: EngineCL API used in Binomial benchmark.…”
Section: Case 1: Using Only One Devicementioning
confidence: 99%
See 2 more Smart Citations
“…1 auto kernel = file_read("binomial.cl"); 2 auto samples = 16777216; auto steps = 254; 3 auto steps1 = steps + 1; auto lws = steps1; 4 auto samplesBy4 = samples / 4; 5 auto gws = lws * samplesBy4; 6 vector<cl_float4> in(samplesBy4); 7 vector<cl_float4> out(samplesBy4); 8 9 binomial_init_setup(samplesBy4, in, out); 18 program.in(in); 19 program.out(out); 20 21 program.out_pattern(1, lws); 22 23 program.kernel(kernel, "binomial_opts"); 24 program.arg(0, steps); // positional by index 25 program.arg(in); // aggregate 26 program.arg(out); 27 program.arg(steps1 * sizeof(cl_float4), 28 ecl::Arg::LocalAlloc); 29 program.arg(4, steps * sizeof(cl_float4), 30 ecl::Arg::LocalAlloc); 31 32 engine.use(std::move(program)); 33 34 engine.run(); 35 36 // if (engine.has_errors()) // [Optional lines] 37 // for (auto& err : engine.get_errors()) 38 // show or process errors Listing 1: EngineCL API used in Binomial benchmark.…”
Section: Case 1: Using Only One Devicementioning
confidence: 99%
“…The experiments have been carried out using two different machines to validate both code portability and performance of EngineCL. 1 auto kernel = file_read("nbody.cl"); 2 auto gpu_kernel = file_read("nbody.gpu.cl"); 3 auto phi_kernel_bin = 4 file_read_binary("nbody.phi.cl.bin"); 5 auto bodies = 512000; auto del_t = 0.005f; 6 auto esp_sqr = 500.0f; auto lws = 64; 7 auto gws = bodies; 8 vector<cl_float4> in_pos(bodies); 9 vector<cl_float4> in_vel(bodies); 10 vector<cl_float4> out_pos(bodies); 11 vector<cl_float4> out_vel(bodies); 12 13 nbody_init_setup(bodies, del_t, esp_sqr, in_pos, 14 in_vel, out_pos, out_vel); 15 16 ecl::EngineCL engine; 17 engine.use(ecl::Device(0, 0), 18 ecl::Device(0, 1, phi_kernel_bin), 19 ecl::Device(1, 0, gpu_kernel)); 20 21 engine.work_items(gws, lws); 22 23 auto props = { 0.08, 0.3 }; 24 engine.scheduler(ecl::Scheduler::Static(props)); 25 26 ecl::Program program; 27 program.in(in_pos); 28 program.in(in_vel); 29 program.out(out_pos); 30 program.out(out_vel); 31 32 program.kernel(kernel, "nbody"); 33 program.args(in_pos, in_vel, bodies, del_t, 34 esp_sqr, out_pos, out_vel); 35 36 engine.program(std::move(program)); 37 38 engine.run(); Listing 2: EngineCL API used in NBody benchmark.…”
Section: System Setupmentioning
confidence: 99%
See 1 more Smart Citation
“…To optimize the load balancing problem among multi-GPU nodes for large scale applications with highly repetitive computational procedures or iterations, this paper presents a novel DLB model based on fuzzy neural network (FNN) and data set division techniques for heterogeneous multi-GPU systems, and this study is extended from our previous publication [24]. In this study, five real-time state feedback parameters closely relating to the computational performance of every GPU node are defined.…”
Section: Introductionmentioning
confidence: 99%