Efficient Processing of Large Data Structures on GPUs: Enumeration Scheme Based Optimisation

Gorawski, Marcin; Lorek, Michal

doi:10.1007/s10766-017-0515-0

Cited by 4 publications

(2 citation statements)

References 27 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, we are interested in performing a quantitative performance comparison between the Hamilton graph-based method and HNN and EACS in the future. The optimization method based on enumeration proposed in [41] can improve the computing efficiency of large-scale matrices by 300%. We are interested in exploring this method for the determination of intelligent manufacturing scheme.…”

Section: Discussionmentioning

confidence: 99%

A Graph Theory-Based Optimization Design for Complex Manufacturing Processes

et al. 2020

View full text Add to dashboard Cite

The manufacturing process of modern equipment becomes very complex due to features such as mass units, multiple machining, and complicated coupling-relationships, posing a big challenge for determining the manufacturing scheme. This paper addresses the challenge by proposing a graph theory-based optimization design for the complex manufacturing process. A detailed analysis of a serial of graph models built according to the manufacturing process features reveals that the Hamilton graph is suitable for modeling the manufacturing process system. Some model weight assignment functions are extracted for the quantitative study. Further the optimal scheme for an optimization design of the complex manufacturing process is solved using the full link graph feature algorithm -a search optimization algorithm. A manufacturing model matrix is constructed, and penalty number and divisor are formulated to simplify the matrix and improve the algorithm efficiency in the process of algorithm design. An example is provided to demonstrate feasibility and effectiveness of the proposed method.INDEX TERMS Graph model, graph theory, manufacturing process, model weight, optimization design.ZHONG HAN received the B.S. degree in computer application technology and the M.S. degree in computer science and technology from the University of Electronical Science and Technology, Chengdu, China, respectively, and the Ph.D. degree in mechanical engineering and automation from Xi'an Jiaotong University, Xi'an, China.He was a Postdoctoral Researcher of intelligent manufacturing technology with the Xi'an Jiaotong University of China. He is currently an Associate

show abstract

Section: Discussionmentioning

confidence: 99%

A Graph Theory-Based Optimization Design for Complex Manufacturing Processes

et al. 2020

View full text Add to dashboard Cite

show abstract

“…Our future work will involve understanding better the parallel performance of our implementations, and analysing the caching behavior of the TT and GCD transpose algorithms on a wider variety of systems, including graphical processing units. Gorawski and Lorek have shown how different enumeration schemes can mitigate the performance degradation associated with Translation Lookaside Buffer misses when transposing square matrices. Their work may be relevant to the GCD transpose algorithm, which involves transposing square matrices in‐place, and this will be investigated in future work.…”

Section: Discussionmentioning

confidence: 99%

Algorithms for in‐place matrix transposition

Gustavson¹,

Walker

2018

Concurrency and Computation

View full text Add to dashboard Cite

Summary This paper presents implementations of in‐place algorithms for transposing rectangular matrices. One implementation is a swap‐based algorithm described by Tretyakov and Tyrtyshnikov,1 to which we have introduced a number of variations. In particular, we show how the original algorithm can be modified to require constant additional memory. A proof of correctness is also sketched. This algorithm is compared with cycle‐following approaches and with the swap‐based GCD Transpose algorithm that partitions the matrix into a hierarchy of square submatrices. The performance of parallel implementations on a multicore system is also investigated.

show abstract