We present a novel system for detection, localization and tracking of multiple people, which fuses a multi-view computer vision approach with a radio-based localization system. The proposed fusion combines the best of both worlds, excellent computer-vision-based localization, and strong identity information provided by the radio system, and is therefore able to perform tracking by identification, which makes it impervious to propagated identity switches. We present comprehensive methodology for evaluation of systems that perform person localization in world coordinate system and use it to evaluate the proposed system as well as its components. Experimental results on a challenging indoor dataset, which involves multiple people walking around a realistically cluttered room, confirm that proposed fusion of both systems significantly outperforms its individual components. Compared to the radio-based system, it achieves better localization results, while at the same time it successfully prevents propagation of identity switches that occur in pure computer-vision-based tracking.