In advanced High-performance computing (HPC), convolution operations take a big proportion in convolutional neural networks, and convolutional neural networks very common in image and video based deep learning applications, because of which, this paper takes improving the performance of convolution operation as the research direction. Convolution can be performance in many ways, such as using mathematical definition to calculate, conversing to Fast Fourier Transform (FFT), conversing to batch matrix multiplication (im2col) or using Winograd algorithm. For small filter, Winograd has unique advantages. AMD based ROCm environment, the implementation of Winograd and an optimization method of Winograd based on multi-thread communication algorithm are introduced in this paper. For the Winograd convolution in ROCm 2.9.0, the speed of the algorithm was increased by more than 150% after optimization in this paper. Under some certain computing power ituations, the performance of the optimization algorithm approaches or even exceeds cuDNN and MIOpen.