GPU computing performance analysis on matrix multiplication

Huang, Zhibin; Ma, Ning; Wang, Shaojun; Yu, Peng

doi:10.1049/joe.2018.9178

Cited by 7 publications

(4 citation statements)

References 16 publications

(18 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The evaluation of the performance of the MMM in diverse platforms such as GPUs and multi-core processors has also been carried out in terms of execution time [35,36,37], among others. According to the reliability enhancement, there are both hardware and software solutions.…”

Section: Related Workmentioning

confidence: 99%

Towards functional safety compliance of matrix–matrix multiplication for machine learning-based autonomous systems

Fernández

Pérez

Agirre

et al. 2021

Journal of Systems Architecture

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Towards functional safety compliance of matrix–matrix multiplication for machine learning-based autonomous systems

Fernández

Pérez

Agirre

et al. 2021

Journal of Systems Architecture

View full text Add to dashboard Cite

“…Here, we represent membrane systems as matrices that can be divided into sub-blocks to balance the number of threads used in GPU thread blocks [35,36]. The objects in the membranes are subsequently assigned to matrix entries (Figure 4), thereby increasing the efficiency with which the matrix allocates the threads in the thread blocks.…”

Section: Proposed Approachmentioning

confidence: 99%

“…It determines whether assigning sub-matrices, including additional membranes to each thread block, will cause a decrease in communications between threads and increase GPU occupancy. Matrices are apportioned into sub-blocks to fully utilize the maximum possible quantity of threads in each thread block [35,36]. This method eliminates shortcomings associated with previously implemented methods that applied one of the ensuing two notions: (i) allocating any quantity of objects in every membrane to every thread block or (ii) first designating an active membrane system in which the quantity of objects in every membrane achieves the highest quantity of threads in a GPU thread block.…”

Section: Introductionmentioning

confidence: 99%

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Muniyandi

Maroosi

2020

Processes

View full text Add to dashboard Cite

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.

show abstract

“…The key is to distribute the computation between multiple threads where it can be done concurrently. The computation can be divided based on multiple memory hierarchies, where sub-tasks use different levels of memory, or multiple computation resources such as threads or machines [93,94,95].…”

Section: Advances In Matrix Multiplicationmentioning

confidence: 99%

Efficient Algorithms for Artificial Neural Networks and Explainable AI

Eshkiki¹

View full text Add to dashboard Cite

Artificial neural networks have allowed some remarkable progress in fields such as pattern recognition and computer vision. However, the increasing complexity of artificial neural networks presents a challenge for efficient computation. In this thesis, we first introduce a novel matrix multiplication method to reduce the complexity of artificial neural networks, where we demonstrate its suitability to compress fully connected layers of artificial neural networks. Our method outperforms other state-of-the-art methods when tested on standard publicly available datasets. This thesis then focuses on Explainable AI, which can be critical in fields like finance and medicine, as it can provide explanations for some decisions taken by sub-symbolic AI models behaving like a black box such as Artificial neural networks and transformationbased learning approaches. We have also developed a new framework that facilitates the use of Explainable AI with tabular datasets. Our new framework Exmed, enables nonexpert users to prepare data, train models, and apply Explainable AI techniques effectively.Additionally, we propose a new algorithm that identifies the overall influence of input features and minimises the perturbations that alter the decision taken by a given model.Overall, this thesis introduces innovative and comprehensive techniques to enhance the efficiency of fully connected layers in artificial neural networks and provide a new approach to explain their decisions. These methods have significant practical applications in various fields, including portable medical devices. i Another person I would like to thank is Professor Xiangua Xie for all his assistance and support during this Ph.D.

show abstract

GPU computing performance analysis on matrix multiplication

Cited by 7 publications

References 16 publications

Towards functional safety compliance of matrix–matrix multiplication for machine learning-based autonomous systems

Towards functional safety compliance of matrix–matrix multiplication for machine learning-based autonomous systems

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Efficient Algorithms for Artificial Neural Networks and Explainable AI

Contact Info

Product

Resources

About