Data-parallel accelerator devices such as Graphical Processing Units (GPUs) are providing dramatic performance improvements over even multi-core CPUs for lattice-oriented applications in computational physics. Models such as the Ising and Potts models continue to play a role in investigating phase transitions on smallworld and scale-free graph structures. These models are particularly well-suited to the performance gains possible using GPUs and relatively high-level device programming languages such as NVIDIA's Compute Unified Device Architecture (CUDA). We report on algorithms and CUDA data-parallel programming techniques for implementing Metropolis Monte Carlo updates for the Ising model using bit-packing storage, and adjacency neighbour lists for various graph structures in addition to regular hypercubic lattices. We report on parallel performance gains and also memory and performance tradeoffs using GPU/CPU and algorithmic combinations.