Parallelization of the finite-element method (FEM) has been contemplated by the scientific and high-performance computing community for over a decade. Most of the computations in the FEM are related to linear algebra that includes matrix and vector computations. These operations have the single-instruction multiple-data (SIMD) computation pattern, which is beneficial for shared-memory parallel architectures. General-purpose graphics processing units (GPGPUs) have been effectively utilized for the parallelization of FEM computations ever since 2007. The solver step of the FEM is often carried out using conjugate gradient (CG)type iterative methods because of their larger convergence rates and greater opportunities for parallelization. Although the SIMD computation patterns in the FEM are intrinsic for GPU computing, there are some pitfalls, such as the underutilization of threads, uncoalesced memory access, lower arithmetic intensity, limited faster memories on GPUs and synchronizations. Nevertheless, FEM applications have been successfully deployed on GPUs over the last 10 years to achieve a significant performance improvement. This paper presents a comprehensive review of the parallel optimization strategies applied in each step of the FEM. The pitfalls and tradeoffs linked to each step in the FEM are also discussed in this paper. Furthermore, some extraordinary methods that exploit the tremendous amount of computing power of a GPU are also discussed. The proposed review is not limited to a single field of engineering. Rather, it is applicable to all fields of engineering and science in which FEM-based simulations are necessary.
Fixed-grid discretization strategy is proposed for static structural Finite Element Analysis (FEA) of Functionally Graded Materials (FGM). The fixed-grid strategy reduces numerical integration cost dramatically by generating a single local stiffness matrix for isotropic materials. For FGMs, domain is discretized into the layers in such a way that material properties in each layer are constant. Therefore, for each layer, a single local stiffness matrix will be constructed. These matrices are directly used in the solver phase of the assembly-free FEM without constructing the global stiffness matrix. The fixed-grid strategy reduces the global memory transactions on the GPU by storing these elemental matrices in on-chip shared memory or cached constant memory.Furthermore, the assembly-free method is adopted to leverage a fine grained parallelism on the GPU at the degree of freedom level. Numerical experiments showed the effectiveness of the discrete layered approach for FGM using the fixed-grid strategy. For performance evaluation two strategies using global memory and shared memory are compared and found that the use of shared memory can achieve approximately 2.4 times better performance than global memory.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.