The fusion of on-board sensors and transmitted information via inter-vehicle communication has been proved to be an effective way to increase the perception accuracy and extend the perception range of connected intelligent vehicles. The current approaches rely heavily on the accurate self-localization of both host and cooperative vehicles. However, such information is not always available or accurate enough for effective cooperative sensing. In this paper, we propose a robust cooperative multi-vehicle tracking framework suitable for the situation where the self-localization information is inaccurate. Our framework consists of three stages. First, each vehicle perceives its surrounding environment based on the on-board sensors and exchanges the local tracks through inter-vehicle communication. Then, an algorithm based on Bayes inference is developed to match the tracks from host and cooperative vehicles and simultaneously optimize the relative pose. Finally, the tracks associated with the same target are fused by fast covariance intersection based on information theory. The simulation results based on both synthesized data and a high-quality physics-based platform show that our approach successfully implements cooperative tracking without the assistance of accurate self-localization.