We have repurposed Google tensor processing units (TPUs), application-specific chips developed for machine learning, into large-scale dense linear algebra supercomputers. The TPUs’ fast intercore interconnects (ICIs), physically two-dimensional network topology, and high-bandwidth memory (HBM) permit distributed matrix multiplication algorithms to rapidly become computationally bound. In this regime, the matrix-multiply units (MXUs) dominate the runtime, yielding impressive scaling, performance, and raw size: Operating in float32 precision, a full 2,048-core pod of third-generation TPUs can multiply two matrices with linear size
N
=
2
20
=
1
,
048
,
576
in about 2 min. Via curated algorithms emphasizing large, single-core matrix multiplications, other tasks in dense linear algebra can similarly scale. As examples, we present 1) QR decomposition; 2) resolution of linear systems; and 3) the computation of matrix functions by polynomial iteration, demonstrated by the matrix polar factorization.
We demonstrate the use of Googles cloud-based Tensor
Processing
Units (TPUs) to accelerate and scale up conventional (cubic-scaling)
density functional theory (DFT) calculations. Utilizing 512 TPU cores,
we accomplish the largest such DFT computation to date, with 247848
orbitals, corresponding to a cluster of 10327 water molecules with
103270 electrons, all treated explicitly. Our work thus paves the
way toward accessible and systematic use of conventional DFT, free
of any system-specific constraints, at unprecedented scales.
Tensor Processing Units (TPUs) were developed by Google exclusively to support large-scale machine learning tasks. TPUs can, however, also be used to accelerate and scale up other computationally demanding tasks. In this paper we repurpose TPUs for the challenging problem of simulating quantum spin systems. Consider a lattice model made of N spin-1 2 quantum spins, or qubits, with a Hamiltonian H = i hi that is a sum of local terms hi and a wavefunction |Ψ consisting of 2 N complex amplitudes. We demonstrate the usage of TPUs for both (i) computing the ground state |Ψgs of the Hamiltonian H, and (ii) simulating the time evolution |Ψ(t) = e −itH |Ψ(0) generated by this Hamiltonian starting from some initial state |Ψ(0) . The bottleneck of the above tasks is computing the product H |Ψ , which can be implemented with remarkable efficiency utilising the native capabilities of TPUs. With a TPU v3 pod, with 2048 cores, we simulate wavefunctions |Ψ of up to N = 38 qubits. The dedicated matrix multiplication units (MXUs), the high bandwidth memory (HBM) on each core, and the fast inter-core interconnects (ICIs) together provide performance far beyond the capabilities of general purpose processors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.