This study proposes a novel robust video tracking algorithm consists of target detection, multi-feature fusion, and extended Camshift. Firstly, a novel target detection method that integrates Canny edge operator, three-frame difference, and improved Gaussian mixture model (IGMM)-based background modelling is provided to detect targets. The IGMM-based background modelling divides video frames into meshes to avoid pixel-wise processing. In addition, the output of the target detection is utilised to initialise the IGMM and to accelerate the convergence of iterations. Secondly, low-dimensional regional covariance matrices are introduced to describe video targets by fusing multiple features like pixel location, colour index, rotation and scale invariant features as well as uniform local binary patterns, and directional derivatives. Thirdly, an extended Camshift based on adaptive kernel bandwidth and robust H ∞ state estimation is proposed to predict the states of fast moving targets and to reduce the mean shift iterations. Finally, the effectiveness of the proposed tracking algorithm is demonstrated via experiments. 2 Automatic detection of video targets Researchers have proposed various target detection methods such as background subtraction (BGS), adjacent frames difference, and target modelling based on appearance. In this section, an automatic target detection method both takes care of the efficiency and the accuracy is presented as shown in Fig. 1, where I k and B k , respectively, represent the image frame and the background at