In various applications such as trajectory tracking in mobile social networks and online recommendation systems, the massive raw data are often incomplete due to various unpredictable or unavoidable reasons. Matrix completion algorithms are effective for reconstructing two-dimensional data, but sending raw data containing personal, sensitive information to cloud computing nodes for matrix completion may lead to privacy exposure issue. The homomorphic matrix completion is a promising approach to perform matrix completion while preserving privacy. However, CPU-based homomorphic matrix completion has low performance, making it impractical to process multiple or large-scale data completion tasks in realtime. In this paper, we propose a high-performance homomorphic matrix completion scheme by exploiting commodity GPUs (Graphics Processing Units) that are widely available in HPC servers and cloud computing nodes. First, we design and implement a baseline GPU-based homomorphic matrix completion, and propose techniques to optimize memory accesses, GPU utilization, and communications. Second, we propose a shard mode for large-scale matrices exceeding GPU memory capacity. Third, we propose a multi-GPU mode to fully utilize multiple GPUs in computing nodes. Experiment results show that the proposed scheme is both fast and accurate. On matrices of varying sizes, the proposed scheme running on a single Tesla V100 GPU achieves up to 116.23× speedups over the CPU MATLAB implementation running on dual Xeon CPUs. The multi-GPU mode achieves up to 1.84× speedups on two GPUs versus on a single GPU. For large-scale matrices, the shard mode achieves up to 174.92× speedups on a single GPU over the CPU MATLAB implementation on two CPUs, and further achieves up to 1.35× speedups when running on two GPUs using the multi-GPU mode.INDEX TERMS GPU, homomorphic matrix completion, least squares minimization.