In this paper, we study the user localization and tracking problem in the reconfigurable intelligent surface (RIS) aided multiple-input multiple-output (MIMO) system, where a multi-antenna base station (BS) and multiple RISs are deployed to assist the localization and tracking of a multi-antenna user. By establishing a probability transition model for user mobility, we develop a message-passing algorithm, termed the Bayesian user localization and tracking (BULT) algorithm, to estimate and track the user position and the angle-of-arrival (AoAs) at the user in an online fashion. We also derive Bayesian Cramér Rao bound (BCRB) to characterize the fundamental performance limit of the considered tracking problem. To improve the tracking performance, we optimize the beamforming design at the BS and the RISs to minimize the derived BCRB. Simulation results show that our BULT algorithm can perform close to the derived BCRB, and significantly outperforms the counterpart algorithms without exploiting the temporal correlation of the user location.