“…Reference [ [14] , [15] , [16] , [17] ] employed convolution kernels of different sizes to extract common and unique features of source images. Reference [ [18] , [19] , [20] ] captured the multilevel features of the source images via residual learning. Moreover, modern GAN-based approaches [ [21] , [22] , [23] , [24] , [25] , [26] , [27] , [28] , [29] , [30] ] exploit multi-granularity convolution kernels of the same feature level, yielding different receptive fields and in turn improving fusion performance.…”