In this paper, we propose an efficient and accurate message-passing interface (MPI)-based parallel simulator for streamer discharges in three dimensions using the fluid model. First, we propose a new second-order semi-implicit scheme for the temporal discretization of the model that relaxes the dielectric relaxation time restriction. Moreover, it requires solving the Poisson-type equation only once at each time step, while the classical second-order explicit scheme typically need to do twice. Second, we introduce a geometric multigrid preconditioned FGMRES solver that dramatically improves the efficiency of solving the Poisson-type equation with either constant or variable coefficients. We show numerically that no more than 4 iterations are required for the Poisson solver to converge to a relative residual of 10 â8 during streamer simulations; the FGMRES solver is much faster than R&B SOR and other Krylov subspace solvers. Last but not least, all the methods are implemented using MPI. The parallel efficiency of the code and the fast algorithmic performances are demonstrated by a series of numerical experiments using up to 2560 cores on the Tianhe2-JK clusters. For applications, we study a double-headed streamer discharge as well as the interaction between two streamers, using up to 10.7 billion mesh cells. arXiv:1812.08606v3 [physics.comp-ph] 17 Sep 2019 Recently, Teunissen and Ebert reported on 3D streamer simulations using the parallel adaptive Afivo framework [35], which features adaptive mesh refinement (AMR), geometric multigrid methods for the Poisson equation and OpenMP parallelism. AMR performs well, however, further improvement could be made by replacing the OpenMP parallelism with message-passing interface (MPI) libraries to fully exploit the power of clusters, especially for simulations of very long streamers. Another advance in the MPI-based simulation was reported by Plewa, Eichwald, and Ducasse et al. [28], who used the successive over-relaxation iterative solver in the red and black strategy (R&B SOR) as the Poisson solver, and tested the parallelization and the scalability with cell numbers ranging from 8512 million and core numbers from 201600. Their use of high-performance computing clusters with an MPI implementation reduced the computational time. These previous works suggest an efficient simulator has advantages in the following aspects: (1) an efficient time integration scheme, which may reduce the time marching steps; (2) a fast algebraic elliptic solver, which accelerates the solution of the Poisson equation that dominates the total computing time;(3) a good parallelization, which allows to utilize the full power of modern clusters; (4) adaptive mesh strategies, which can reduce the number of DOFs. This paper contributes to the first three aspects. First,