We introduce dynamic predictive coding, a new hierarchical model of spatiotemporal prediction and sequence learning in the cortex. The model assumes that higher cortical levels modulate the temporal dynamics of lower levels, correcting their predictions of dynamics using precision-weighted prediction errors. We tested this model using a two-level neural network, where the top-down modulation is implemented as a low-dimensional mixture of possible temporal dynamics. When trained on natural videos, the first-level neurons developed space-time receptive fields similar to those found in simple cells of the primary visual cortex. The second-level responses spanned longer timescales and showed more stability than those at the first level, mimicking temporal response hierarchies in the cortex. After adapting to a repeated visual sequence, the model displayed full recall of the sequence given only the beginning of the sequence, similar to sequence recall in the visual cortex. Our results suggest that sequence learning and temporal prediction in the cortex can be interpreted as dynamic predictive coding based on a hierarchical generative model of input sequences.