Shale permeability is sufficiently low to require an unconventional scale of stimulation treatments, such as very-large-volume, high-rate, multistage hydraulic-fracturing applications. Upscaling of hydrocarbon transport processes in shales is challenging because of the low permeability and strong heterogeneity. Rock characterization with high-resolution imaging [X-ray tomography and scanning electron microscope (SEM)] is usually highly localized and contains significant uncertainties because of the small field of view. Therefore, an effective high-performance computing method is required to collect information over a larger scale to meet the ergodicity requirement in upscaling. The lattice Boltzmann (LB) method has received significant attention in computational fluid dynamics because of its capability in coping with complicated boundary conditions. A combination of high-resolution imaging and LB simulation is a powerful approach for evaluating the transport properties of a porous medium in a timely manner, on the basis of the numerical solution of the Navier-Stokes equations and Darcy's law. In this work, a graphics-processing-unit (GPU) -enhanced lattice Boltzmann simulator (GELBS) was developed, which was optimized by GPU parallel computing on the basis of the inherent parallelism of the LB method. Specifically, the LB method was used to implement the computational kernel; a sparse data structure was applied to optimize memory allocation; the OCCA (Medina et al. 2014) portability library was used, which enables the GELBS codes to use different application-programming interfaces (APIs) including open computing language (OpenCL), compute unified device architecture (CUDA), and open multiprocessing (OpenMP).OpenCL is an open standard for cross-platform parallel computing, CUDA is supported only by NVIDIA devices, and OpenMP is primarily used on central processing units (CPUs). It was found that the GPU-accelerated code was approximately 1,000 times faster than the unoptimized serial code and 10 times faster than the parallel code run on a standalone CPU. The CUDA code was slightly faster than OpenCL code on the NVIDA GPU because of the extra cost of OpenCL used to adapt to a heterogeneous platform. The GELBS was validated by comparing it with analytical solutions, laboratory measurements, and other independent numerical simulators in previous studies, and it was proved to have a second-order global accuracy. The GELBS was then used to analyze thin cuttings extracted from a sandstone reservoir and a shale-gas reservoir. The sandstone permeabilities were found relatively isotropic, whereas the shale permeabilities were strongly anisotropic because of the horizontal lamination structure. In shale cuttings, the average permeability in the horizontal direction was higher than that in the vertical direction by approximately two orders of magnitude. Correlations between porosity and permeability were observed in both rocks. The combination of GELBS and high-resolution imaging methods makes for a power-ful tool for perme...
Coordination and synchronization of parallel tasks is a major source of complexity in parallel programming. These constructs take many forms in practice including directed barrier and point-to-point synchronizations, termination detection of child tasks, and mutual exclusion in accesses to shared resources. A read-write lock is a synchronization primitive that supports mutual exclusion in cases when multiple reader threads are permitted to enter a critical section concurrently (read-lock), but only a single writer thread is permitted in the critical section (write-lock). Although support for reader threads increases ideal parallelism, the read-lock functionality typically requires additional mechanisms, including expensive atomic operations, to handle multiple readers. It is not uncommon to encounter cases in practice where the overhead to support read-lock operations overshadows the benefits of concurrent read accesses, especially for small critical sections.In this paper, we introduce a new read-write lock algorithm that reduces this overhead compared to past work. The correctness of the algorithm, including deadlock freedom, is established by using the Java Pathfinder model checker. We also show how the read-write lock primitive can be used to support high-level language constructs such as object-level isolation in Habanero-Java (HJ) [6]. Experimental results for a read-write microbenchmark and a concurrent SortedLinkedList benchmark demonstrate that a Java-based implementation of the proposed read-write lock algorithm delivers higher scalability on multiple platforms than existing read-write lock implementations, including ReentrantReadWriteLock from the java.util.concurrent library.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.