Multi-spectral camera set-ups may generally allow for creating surveillance applications even under unfavorable conditions, such as low-light environments or scenes involving vastly dierent lighting conditions. A highresolution color camera, a high-dynamic-range camera and an infrared thermal camera were combined into a self-sucient platform for continuous outdoor operation. The sheer amount of produced data poses a serious challenge, both in terms of available bandwidth and processing power, because self-suciency requires using relatively low-power components, and privacy, as high-resolution, multi-spectral image data are sensitive information. Thus, relevant objects of interest had to be eciently extracted, tracked and georeferenced on the sensor platform. These data, from one or more sensorheads, are then sent via WLAN or mobile data link to a central control unit, possibly anonymized, e.g. prompting immediate action by a human operator in a disaster response use case, or stored for further oine analysis when used in the framework of Smart City. Applying the classic stereo vision approach would require calibrating both intrinsic and extrinsic parameters of all cameras. The input data's multi-spectral nature complicates the correspondence problem for extrinsic parameter calibration and subsequent stereo matching, while intrinsic parameter calibration according to the pinhole camera model is made dicult due to the cameras having to be focused at innity. However, by making certain reasonable assumptions about the observed scene in typical use cases, accepting a possible loss in localization accuracy, camera calibration could be limited to the bare minimum and less computational power was required at run-time.