Topological data analysis (TDA) characterizes the global structure of data
based on topological invariants such as persistent homology, whereas convolutional
neural networks (CNNs) are capable of characterizing local features in the global
structure of the data. In contrast, a combined model of TDA and CNN, a family
of multimodal networks, simultaneously takes the image and the corresponding
topological features as the input to the network for classification, thereby significantly
improving the performance of a single CNN. This innovative approach has been
recently successful in various applications. However, there is a lack of explanation
regarding how and why topological signatures, when combined with a CNN, improve
discriminative power. In this paper, we use persistent homology to compute topological
features and subsequently demonstrate both qualitatively and quantitatively the
effects of topological signatures on a CNN model, for which the Grad-CAM analysis
of multimodal networks and topological inverse image map are proposed and
appropriately utilized. For experimental validation, we utilized two famous datasets:
the transient versus bogus image dataset and the HAM10000 dataset. Using Grad-
CAM analysis of multimodal networks, we demonstrate that topological features
enforce the image network of a CNN to focus more on significant and meaningful
regions across images rather than task-irrelevant artifacts such as background noise
and texture.