Thermal to visible face image translation aims at synthesizing high-fidelity visible face images from thermal counterparts, placing emphasis on preserving the identity of the faces. While remarkable progress has been achieved related to the quality of synthetic images, as well as related to associated face matching accuracy, interpreting the generation process from thermal to visible face images remains an open challenge. Towards tackling this challenge, we present a novel generic attention-guided generative adversarial network (AG-GAN) for thermal to visible image translation. The AG-GAN framework is based on an encoder network that directly generates attention feature maps from an input thermal image in either, supervised or unsupervised fashion. A decoder network takes the attention maps and applies adaptive layer-instance normalization, in order to reconstruct the corresponding visible image. We show that solving thermal to visible image translation tasks through AG-GAN significantly improves the cross-spectral face matching accuracy, as well as inherently supports model explanation.