Automatic player detection, labeling and tracking in broadcast soccer video are significant while quite challenging tasks. In this paper, we present a solution to perform automatic multiple player detection, unsupervised labeling and efficient tracking. Players' position and scale are determined by a boosting based detector. Players' appearance models are unsupervised learned from hundreds of samples automatically collected by detection. Thereafter, these models can be utilized for player labeling (Team A, Team B and Referee). Player tracking is achieved by Markov Chain Monte Carlo (MCMC) data association. Some data driven dynamics are proposed to improve the Markov chain's efficiency. The testing results on FIFA World Cup 2006 video demonstrate that our method can reach high detection and labeling precision, and reliably tracking in cases of scenes such as multiple player occlusion, moderate camera motion and pose variation.
IntroductionAutomatic player localization, labeling and tracking is critical for team tactics, player activity analysis and enjoyment in broadcast sports videos. It is quite challenging due to many difficulties such as player-to-player occlusion, similar player appearance, varying number of players, abrupt camera motion, various noises, video blur, etc.Many algorithms have been presented to deal with the multiple target tracking problem, such as particle filter [1 of these two works, a multi-camera system was used to get a stationary, high-resolution and wide-field view of soccer game. This setting ensured a reliable background subtraction can be obtained. In our application, the camera is not fixed, which results in moving background. Thus, we need robust and adaptive background modeling and effective object association technologies. In another aspect, unsupervised player labeling is preferred for its generalization ability. In this paper, we propose a solution for player detection, labeling and tracking in broadcast soccer video. The system framework is illustrated in Figure 1. The whole procedure is a two-pass video scan. In the first scan, we (1) learn video dominant color via accumulated color histograms, and (2) unsupervised learn players' appearance models over hundreds of player samples collected by a boosted player detector. In the second scan, that is the testing procedure, we first use the dominant color for playfield segmentation and view-type classification. Then we apply a boosting player detector to localize players. Afterwards, the players are labeled as Team A, Team B or Referee with prior learned models. Finally, we perform data-driven MCMC association to generate players' trajectories, in which track length, label consistency and motion consistency are used as criterions for associating observations across frames.The main contributions of our method are: (1) robust player detection achieved by background filtering and a boosted cascade detector; (2) unsupervised player appearance modeling, the referee can be identified in addition to two teams players without any ...