SummaryWith the development of parallel computing architectures, larger and more complex finite element analyses (FEA) are being performed with higher accuracy and smaller execution times. Graphics processing units (GPUs) are one of the major contributors of this computational breakthrough. This work presents a three‐stage GPU‐based FEA matrix generation strategy with the key idea of decoupling the computation of global matrix indices and values by use of a novel data structure referred to as the neighbor matrix. The first stage computes the neighbor matrix on the GPU based on the unstructured mesh. Using this neighbor matrix, the indices and values of the global matrix are computed separately in the second and third stages. The neighbor matrix is computed for three different element types. Two versions for performing numerical integration and assembly in the same or separate kernels are implemented and simulations are run for different mesh sizes having up to three million degrees of freedom on a single GPU. Comparison with GPU‐based parallel implementation from the literature reveals speedup ranging from 4× to 6× for the proposed workload division strategy. Furthermore, the same kernel implementation is found to outperform the separate kernel implementation by 70% to 150% for different element types.
Topology optimization has been successful in generating optimal topologies of various structures arising in real-world applications. Since these applications can have complex and large domains, topology optimization suffers from a high computational cost because of the use of unstructured meshes for discretization of these domains and their finite element analysis (FEA). This paper addresses this challenge by developing three GPU-based element-by-element strategies targeting unstructured all-hexahedral mesh for the matrix-free precondition conjugate gradient (PCG) finite element solver. These strategies mainly perform sparse matrix multiplication (SpMV) arising with the FEA solver by allocating more compute threads of GPU per element. Moreover, the strategies are developed to use shared memory of GPU for efficient memory transactions. The proposed strategies are tested with solid isotropic material with penalization (SIMP) method on four examples of 3D structural topology optimization. Results demonstrate that the proposed strategies achieve speedup up to 8.2× over the standard GPU-based SpMV strategies from the literature.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.