2013
DOI: 10.1016/j.jpdc.2012.09.006
|View full text |Cite
|
Sign up to set email alerts
|

Benchmarking of communication techniques for GPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
1

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
1
1

Relationship

2
4

Authors

Journals

citations
Cited by 16 publications
(19 citation statements)
references
References 8 publications
0
18
0
1
Order By: Relevance
“…On the same GPU, when the counter pool contains the same number of counters, VATE 's PT is significantly less than VDRE 's (VATE 's PT accounts for only 25% to 0.25% of VDRE 's PT). On GTX650-1GB, when the number of counters is 2 28 , the PT of VDRE is as high as 1296 milliseconds, and the sum of three running times is 1447 milliseconds. For the algorithm under the sliding time window, the total running time in each time slice should not exceed the length of a time slice.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…On the same GPU, when the counter pool contains the same number of counters, VATE 's PT is significantly less than VDRE 's (VATE 's PT accounts for only 25% to 0.25% of VDRE 's PT). On GTX650-1GB, when the number of counters is 2 28 , the PT of VDRE is as high as 1296 milliseconds, and the sum of three running times is 1447 milliseconds. For the algorithm under the sliding time window, the total running time in each time slice should not exceed the length of a time slice.…”
Section: Methodsmentioning
confidence: 99%
“…A GPU chip contains hundreds to thousands of processing units, far more than that in the CPU. For tasks without data access conflicts and using the same instructions to process different data (single instruction multiple data streams, SIMD), GPU can achieve high speedup [28] [29].…”
Section: Deploy Vate On Gpumentioning
confidence: 99%
“…Graphic processing unit (GPU) is one of the most popular parallel computing platform in recent years. For these tasks that have no data accessing conflict and processing different data with the same instructions (SIMD), GPU can acquire a high speed up [2] [19]. Every packet will update SEAV and LDCA.…”
Section: Distributed Super Points Detection On Gpumentioning
confidence: 99%
“…As far as we know, there are just a few works showing strong scaling results for spin systems [13,20,21]. We chose to adopt the same technique proposed in [20,21] where the partitioning is performed along the z -axis of the system. All communications among nodes are handled by MPI and the overlap between calculations and communications is achieved by using CUDA streams.…”
Section: Multi-gpu Implementationmentioning
confidence: 99%