In this paper, we present an efficient approach to detect and track salient objects from videos. In general, colored visible image in red-green-blue (RGB) has better distinguishability in human visual perception, yet it suffers from the effect of illumination noise and shadows. On the contrary, thermal image is less sensitive to these noise effects though its distinguishability varies according to environmental settings. To this end, fusion of these two modalities provides an effective solution to tackle this problem. First, a background model is extracted followed by background-subtraction for foreground detection in visible images. Meanwhile, adaptively thresholding is applied for foreground detection in thermal domain as human objects tend to be of higher temperature thus brighter than the background. To deal with cases of occlusion, prediction based forward tracking and backward tracking are employed to identify separate objects even the foreground detection fails. The proposed method is evaluated on OTCBVS, a publicly available color-thermal benchmark dataset. Promising results have shown that the proposed fusion based approach can successfully detect and track multiple human objects.