The dynamic gesture trajectory recognition results are low accurate and poor real-time due to the problems of occlusion, complex background and fast gesture movement. In this paper, we take advantage of the advantages of machine vision to extract the video keyframes by the three-frame differential method and use the annotation software to produce the dataset. The you only look once 4 (YOLOv4) algorithm is improved to reduce the redundancy of the network structure and enhance the applicability of the feature map for hand gesture recognition. Combined with the Deep-sort real-time tracking feature, the hand motion trajectory is obtained by introducing the epiphenomenal features to effectively avoid the situation that the object is not tracked when it is obscured. To avoid the problem of gradient disappearance during deep network training, the DenseNet-BC-169 network is used to balance the recognition rate and training time for gesture trajectory classification. Compared with FLIXT, the winner of the dynamic gesture recognition challenge, the final results showed a 6.13% improvement in accuracy and video processing with the IsoGD dataset reached 31fps, validating the effectiveness of this method.