A key element of contemporary computer vision, image fusion tries to improve the quality and interpretability of images by combining complimentary data from several image sources or modalities. This paper offers a unique method for multi-modal image fusion, combining the benefits of Deep Convolutional Neural Networks (CNNs) and Non-Negative Matrix Factorization (NMF), by using current developments in deep learning and matrix factorization techniques. Deep CNNs have shown to be remarkably effective in extracting features from images, capturing complex patterns and discriminative data. A group of deep CNNs are trained using this suggested technique on a varied dataset of multi-modal images. With the help of these networks, which extract and encode pertinent characteristics from several modalities, information-rich representations may then be combined. Concatenating, the features that were derived from the CNNs throughout the fusion process results in a fused feature representation that perfectly expresses the input modalities. The main novelty is the two-stage integration of NMF: first, breaking down the fused feature representation into non-negative basis vectors and coefficients, and then, using NMF to further extract important patterns from the fused feature maps. The non-negativity requirement in NMF guarantees the preservation of the natural structures and characteristics present in the source images, resulting in fused images that are both aesthetically pleasing and semantically intelligible. Visual examination of the merged images demonstrates the method's capacity to successfully extract important information from several modalities. The better performance and robustness of the suggested approach, which has an accuracy of roughly 99.12%, are highlighted by comparison with existing fusion approaches.