Non-rigid registration, performing well in all-weather and all-day/night conditions, directly determine the reliability of visible (VIS) and infrared (IR) image fusion. On account of non-planar scenes and differences between IR and VIS cameras, non-linear transformation models are more helpful to non-rigid image registration than the affine model. However, most of non-linear models usually used on non-rigid registration are constructed by control points at present. Aiming at the issue that the adaptiveness and generalization of the control-point-based models are limited, adaptive enhanced affine transformation (AEAT) is proposed for image registration, generalizing the affine model from linear to non-linear case. Firstly, Gaussian weighted shape context, measuring the structural similarity between multimodal images, is designed to extract putative matches from edge maps of IR and VIS images. Secondly, to implement global image registration, the optimal parameters of the AEAT modal are estimated from putative matches by a strategy of subsection optimization. Experiment results show that this approach is robust in different registration tasks and outperforms several competitive methods on registration precision and speed. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.