Despite the broad application of the handheld thermal imaging cameras in firefighting, its usage is mostly limited to subjective interpretation by the person carrying the device. As remedies to overcome this limitation, object localization and classification mechanisms could assist the fireground understanding and help with the automated localization, characterization and spatio-temporal (spreading) analysis of the fire. An automated understanding of thermal images can enrich the conventional knowledge-based firefighting techniques by providing the information from the data and sensing-driven approaches. In this work, transfer learning is applied on multilabeling convolutional neural network architectures for object localization and recognition in monocular visual, infrared and multispectral dynamic images. Furthermore, the possibility of analyzing fire scene images is studied and their current limitations are discussed. Finally, the understanding of the room configuration (i.e., objects location) for indoor localization in reduced visibility environments and the linking with Building Information Models (BIM) are investigated.