Internet of Things (IoT) expects to incorporate massive machine-type (MCT) devices, such as vehicles, sensors, and wearable devices, which brings a large number of application tasks that need to be processed. Additionally, data collected from various devices needs to be executed and processed in a timely, reliable, and efficient manner. Gesture recognition has enabled IoT applications such as human-computer interaction and virtual reality. In this work, we propose a cross-domain device-free gesture recognition (DFGR) model, that exploits 3D-CNN to obtain spatiotemporal features in Wi-Fi sensing. To adapt the sensing data to the 3D model, we carry out 3D data segment and supplement in addition to signal denoising and time-frequency transformation. We demonstrate that our proposed model outperforms the state-of-the-art method in the application of DFGR even cross 3 domain factors simultaneously, and is easy to converge and convenient for training with a less complicated hierarchical structure.