We present the details of a GPU capable exchange correlation (XC) scheme integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Our implementation features an octree based numerical grid point partitioning scheme, GPU enabled grid pruning and basis/primitive function prescreening and fully GPU capable XC energy and gradient algorithms. Benchmarking against the CPU version demonstrated that the GPU implementation is capable of delivering an impres-sive performance while retaining excellent accuracy. For small to medium size protein/organic molecular systems, the realized speedups in double precision XC energy and gradient computation on a NVIDIA V100 GPU were 60 to 80-fold and 140 to 780-fold respectively as compared to the serial CPU implementation. The acceleration gained in density functional theory calculations from a single V100 GPU significantly exceeds that of a modern CPU with 40 cores running in parallel. File list (4) download file view on ChemRxiv SI_v2.0.pdf (301.12 KiB) download file view on ChemRxiv Manuscript_v2.0.pdf (1.70 MiB) download file view on ChemRxiv SI_v2.0.docx (81.63 KiB) download file view on ChemRxiv Manuscript_v2.0.docx (1.98 MiB)
Finding new ways to use artificial intelligence (AI) to accelerate the analysis of gravitational wave data, and ensuring the developed models are easily reusable promises to unlock new opportunities in multi-messenger astrophysics (MMA), and to enable wider use, rigorous validation, and sharing of developed models by the community. In this work, we demonstrate how connecting recently deployed DOE and NSF-sponsored cyberinfrastructure allows for new ways to publish models, and to subsequently deploy these models into applications using computing platforms ranging from laptops to high performance computing clusters. We develop a workflow that connects the Data and Learning Hub for Science (DLHub), a repository for publishing machine learning models, with the Hardware Accelerated Learning (HAL) deep learning computing cluster, using funcX as a universal distributed computing service. We then use this workflow to search for binary black hole gravitational wave signals in open source advanced LIGO data. We find that using this workflow, an ensemble of four openly available deep learning models can be run on HAL and process the entire month of August 2017 of advanced LIGO data in just seven minutes, identifying all four binary black hole mergers previously identified in this dataset, and reporting no misclassifications. This approach, which combines advances in AI, distributed computing, and scientific data infrastructure opens new pathways to conduct reproducible, accelerated, data-driven gravitational wave detection.
Seismograms are convolution results between seismic sources and the media that seismic waves propagate through, and, therefore, the primary observations for studying seismic source parameters and the Earth interior. The routine earthquake location and travel-time tomography rely on accurate seismic phase picks (e.g., P and S arrivals). As data increase, reliable automated seismic phase-picking methods are needed to analyze data and provide timely earthquake information. However, most traditional autopickers suffer from low signal-to-noise ratio and usually require additional efforts to tune hyperparameters for each case. In this study, we proposed a deep-learning approach that adapted soft attention gates (AGs) and recurrent-residual convolution units (RRCUs) into the backbone U-Net for seismic phase picking. The attention mechanism was implemented to suppress responses from waveforms irrelevant to seismic phases, and the cooperating RRCUs further enhanced temporal connections of seismograms at multiple scales. We used numerous earthquake recordings in Taiwan with diverse focal mechanisms, wide depth, and magnitude distributions, to train and test our model. Setting the picking errors within 0.1 s and predicted probability over 0.5, the AG with recurrent-residual convolution unit (ARRU) phase picker achieved the F1 score of 98.62% for P arrivals and 95.16% for S arrivals, and picking rates were 96.72% for P waves and 90.07% for S waves. The ARRU phase picker also shown a great generalization capability, when handling unseen data. When applied the model trained with Taiwan data to the southern California data, the ARRU phase picker shown no cognitive downgrade. Comparing with manual picks, the arrival times determined by the ARRU phase picker shown a higher consistency, which had been evaluated by a set of repeating earthquakes. The arrival picks with less human error could benefit studies, such as earthquake location and seismic tomography.
We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achieving excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular systems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have realized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.