We tackle the problem of tracking the human lower body as an initial step toward an automatic motion assessment system for clinical mobility evaluation, using a multimodal system that combines Inertial Measurement Unit (IMU) data, RGB images, and point cloud depth measurements. This system applies the factor graph representation to an optimization problem that provides 3-D skeleton joint estimations. In this paper, we focus on improving the temporal consistency of the estimated human trajectories to greatly extend the range of operability of the depth sensor. More specifically, we introduce a new factor graph factor based on Koopman theory that embeds the nonlinear dynamics of several lower-limb movement activities. This factor performs a two-step process: first, a custom activity recognition module based on spatial temporal graph convolutional networks recognizes the walking activity; then, a Koopman pose prediction of the subsequent skeleton is used as an a priori estimation to drive the optimization problem toward more consistent results. We tested the performance of this module on datasets composed of multiple clinical lowerlimb mobility tests, and we show that our approach reduces outliers on the skeleton form by almost 1 m, while preserving natural walking trajectories at depths up to more than 10 m.