High-accuracy optimization is the key component of time-sensitive applications in computer sciences such as machine learning, and we develop single-GPU Iterative Discrete Approximation Monte Carlo Optimization (IDA-MCS) and multi-GPU IDA-MCS in our previous research. However, because of the memory capability constrain of GPUs in a workstation, single-GPU IDA-MCS and multi-GPU IDA-MCS may be in low performance or even functionless for optimization problems with complicated shapes such as large number of peaks. In this paper, by the novel idea of parallelizing Iterative Discrete Approximation with CUDA-MPI programming, we develop the GPU cluster version (GPU-cluster) of IDA-MCS with two different parallelization strategies: Domain Decomposition and Local Search, under the style of Single Instruction Multiple Data by CUDA 5.5 and MPICH2, and we exhibit the performance of GPU-cluster IDA-MCS by optimizing complicated cost functions. Computational results show that, by the same number of iterations, for the cost function with millions of peaks, the accuracy of GPU-cluster IDA-MCS is approximately thousands of times higher than that of the conventional method Monte Carlo Search. Computational results also show that, the optimization accuracy from Domain Decomposition IDA-MCS is much higher than that of Local Search IDA-MCS.