Mobile GPU computing, or System on Chip with embedded GPU (SoC GPU), becomes in great demand recently. Since these SoCs are designed for mobile devices with real-time applications such as image processing and video processing, high-efficient implementations of wavelet transform are essential for these chips. In this paper, the author develops two SoC GPU based DWT: signal based parallelization for discrete wavelet transform (sDWT) and coefficient based parallelization for discrete wavelet transform (cDWT), and the author evaluates the performance of three-dimensional wavelet transform on SoC GPU Tegra K1. Computational results show that, SoC GPU based DWT is significantly faster than SoC CPU based DWT. Computational results also show that, sDWT can generally satisfy the requirement of real-time processing (30 frames per second) with the image sizes of 352×288, 480×320, 720×480 and 1280×720, while cDWT can only obtain read-time processing with small image sizes of 352×288 and 480×320.