Cone-beam CT (CBCT) has become the standard image guidance tool for patient setup in image-guided radiation therapy. However, due to its large illumination field, scattered photons severely degrade its image quality. While kernel-based scatter correction methods have been used routinely in the clinic, it is still desirable to develop Monte Carlo (MC) simulation-based methods due to their accuracy. However, the high computational burden of the MC method has prevented routine clinical application. This paper reports our recent development of a practical method of MC-based scatter estimation and removal for CBCT. In contrast with conventional MC approaches that estimate scatter signals using a scatter-contaminated CBCT image, our method used a planning CT image for MC simulation, which has the advantages of accurate image intensity and absence of image truncation. In our method, the planning CT was first rigidly registered with the CBCT. Scatter signals were then estimated via MC simulation. After scatter signals were removed from the raw CBCT projections, a corrected CBCT image was reconstructed. The entire workflow was implemented on a GPU platform for high computational efficiency. Strategies such as projection denoising, CT image downsampling, and interpolation along the angular direction were employed to further enhance the calculation speed. We studied the impact of key parameters in the workflow on the resulting accuracy and efficiency, based on which the optimal parameter values were determined. Our method was evaluated in numerical simulation, phantom, and real patient cases. In the simulation cases, our method reduced mean HU errors from 44 HU to 3 HU and from 78 HU to 9 HU in the full-fan and the half-fan cases, respectively. In both the phantom and the patient cases, image artifacts caused by scatter, such as ring artifacts around the bowtie area, were reduced. With all the techniques employed, we achieved computation time of less than 30 sec including the time for both the scatter estimation and CBCT reconstruction steps. The efficacy of our method and its high computational efficiency make our method attractive for clinical use.