Recent advance in high performance computing (HPC) resources has opened the possibility to expand the scope of density functional theory (DFT) simulations toward large and complex molecular systems. This work proposes a numerically robust method that enables scalable diagonalizations of large DFT Hamiltonian matrices, particularly with thousands of computing CPUs (cores) that are usual these days in terms of sizes of HPC resources. The well-known Lanczos method is extensively refactorized to overcome its weakness for evaluation of multiple degenerate eigenpairs that is the substance of DFT simulations, where a multilevel parallelization is adopted for scalable simulations in as many cores as possible. With solid benchmark tests for realistic molecular systems, the fidelity of our method are validated against the locally optimal block preconditioned conjugated gradient (LOBPCG) method that is widely used to simulate electronic structures. Our method may waste computing resources for simulations of molecules whose degeneracy cannot be reasonably estimated. But, compared to LOBPCG method, it is fairly excellent in perspectives of both speed and scalability, and particularly has remarkably less (< 10%) sensitivity of performance to the random nature of initial basis vectors. As a promising candidate for solving electronic structures of highly degenerate systems, the proposed method can make a meaningful contribution to migrating DFT simulations toward extremely large computing environments that normally have more than several tens of thousands of computing cores. K E Y W O R D S degenerate eigenpairs, density functional theory, high performance computing, Lanczos iterations, large-scale electronic structures 1 | I N TR ODU C TI ON Density functional theory (DFT) calculations are popular to predict various quantum phenomena in solid and molecular systems such as reaction mechanisms, charge transports, phase transitions, and vibrational excitations. [1-3] Theoretical understanding of those natural phenomena ofteninvolves simulations of systems that contain thousands of electrons, which require a huge computational cost that cannot be afforded by a single personal computer any more. Fortunately, the possibility to tackle such computationally expensive problems within reasonable times is opened by high performance computing (HPC) resources that normally consist of many computing cores. As the size of computing resources has been continuously increased with device-downscaling, nowadays it is no more difficult to find HPC clusters that have more than 10 5 cores. [4] Consequently, various implementations and methodologies are evolved and proposed to simulate large-scale electronic structures by leveraging HPC resources with parallel computing. [5][6][7] Real-space approaches are regarded as the reasonable strategy to explore large-scale electronic structures, [8][9][10][11] since they show better systemsize scalability and higher parallel efficiency than atom-centered basis set approaches. Packages of electronic structur...