2010
DOI: 10.1002/jcc.21692
|View full text |Cite
|
Sign up to set email alerts
|

Shared‐memory parallelization of the TURBOMOLE programs AOFORCE, ESCF, and EGRAD: How to quickly parallelize legacy code

Abstract: Abstract:The programs ESCF, EGRAD, and AOFORCE are parts of the TURBOMOLE program package and compute excited-state properties and ground-state geometric hessians, respectively, for Hartree-Fock and density functional methods. The range of applicability of these programs has been extended by allowing them to use all CPU cores on a given node in parallel. The parallelization strategy is not new and duplicates what is standard today in the calculation of ground-state energies and gradients. The focus is on how t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
9

Relationship

0
9

Authors

Journals

citations
Cited by 57 publications
(43 citation statements)
references
References 24 publications
0
43
0
Order By: Relevance
“…37,38 The CAM-B3LYP calculations used NWChem 6.0 39 except in the case of the TD-DFT S1 relaxations, which were performed using GAMESS-US 40 (version 1 October 2010 R1). 37,38 The CAM-B3LYP calculations used NWChem 6.0 39 except in the case of the TD-DFT S1 relaxations, which were performed using GAMESS-US 40 (version 1 October 2010 R1).…”
Section: Computational Methodologymentioning
confidence: 99%
“…37,38 The CAM-B3LYP calculations used NWChem 6.0 39 except in the case of the TD-DFT S1 relaxations, which were performed using GAMESS-US 40 (version 1 October 2010 R1). 37,38 The CAM-B3LYP calculations used NWChem 6.0 39 except in the case of the TD-DFT S1 relaxations, which were performed using GAMESS-US 40 (version 1 October 2010 R1).…”
Section: Computational Methodologymentioning
confidence: 99%
“…SCF convergence criteria were tightened to 10 −8 and the DFT integration grid was of m 4 size. Subsequent frequencies and polarizability derivatives calculations were carried out with the recently parallelised [43] aoforce and egrad modules of the TURBOMOLE package, respectively. The obtained frequencies were scaled down by a factor of 0.965 [44] and the spectrum was plotted with Lorentzian line shape as implemented in Molden [45].…”
Section: Materials and Reagentsmentioning
confidence: 99%
“…The critical section (line [8][9][10][11][12][13][14] is responsible for serializing the caching procedure of temporary data on disk in order, as those file I/O operations are implemented in sequential manner, as opposed to random assessable manner. As the worst case, this critical section in JK routine, which fetches the PS grid points as well as the strips of Q matrix, only consumes 2 to 4% of the calculation time for one loop iteration.…”
Section: Hybrid Implementation Of the Parallel Pseudospectral Algorithmmentioning
confidence: 99%
“…An alternative strategy [8][9][10][11] from an implementation perspective is to design and to implement a hardware-efficient parallel implementation for DFT algorithm. The major considerations supporting this approach are (1) the semiconductor industry has been shifted from frequency-driving paradigm to the multicore paradigm for many years and parallel implementation becomes crucial for scalability on large machines; (2) the continuous introduction of new instruction-level parallelism (i.e., SSE, AVX, etc.)…”
Section: Introductionmentioning
confidence: 99%