Efficient full-chip thermal simulation is among the most challenging problems facing the EDA industry today, especially for modern 3D integrated circuits, due to the huge linear systems resulting from thermal modeling approaches that require unreasonably long computational times. While the formulation problem, by applying a thermal equivalent circuit, is prevalent and can be easily constructed, the corresponding 3D equations network has an undesirable time-consuming numerical simulation. Direct linear solvers are not capable of handling such huge problems, and iterative methods are the only feasible approach. In this paper, we propose a computationally-efficient iterative method with a parallel preconditioned technique that exploits the resources of massively-parallel architectures such as Graphic Processor Units (GPUs). Experimental results demonstrate that the proposed method achieves a speedup of 2.2× in CPU execution and a 26.93× speedup in GPU execution over the state-of-the-art iterative method.