2017
DOI: 10.1007/s10766-017-0515-0
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

Abstract: The purpose of this paper is to highlight the performance issues of the matrix transposition algorithms for large matrices, relating to the Translation Lookaside Buffer (TLB) cache. The existing optimisation techniques such as coalesced access and the use of shared memory, regardless of their necessity and benefits, are not sufficient enough to neutralise the problem. As the data problem size increases, these optimisations do not exploit data locality effectively enough to counteract the detrimental effects of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 27 publications
(40 reference statements)
0
2
0
Order By: Relevance
“…In addition, we are interested in performing a quantitative performance comparison between the Hamilton graph-based method and HNN and EACS in the future. The optimization method based on enumeration proposed in [41] can improve the computing efficiency of large-scale matrices by 300%. We are interested in exploring this method for the determination of intelligent manufacturing scheme.…”
Section: Discussionmentioning
confidence: 99%
“…In addition, we are interested in performing a quantitative performance comparison between the Hamilton graph-based method and HNN and EACS in the future. The optimization method based on enumeration proposed in [41] can improve the computing efficiency of large-scale matrices by 300%. We are interested in exploring this method for the determination of intelligent manufacturing scheme.…”
Section: Discussionmentioning
confidence: 99%
“…Our future work will involve understanding better the parallel performance of our implementations, and analysing the caching behavior of the TT and GCD transpose algorithms on a wider variety of systems, including graphical processing units. Gorawski and Lorek have shown how different enumeration schemes can mitigate the performance degradation associated with Translation Lookaside Buffer misses when transposing square matrices. Their work may be relevant to the GCD transpose algorithm, which involves transposing square matrices in‐place, and this will be investigated in future work.…”
Section: Discussionmentioning
confidence: 99%