Abstract. We introduce a novel local spatio-temporal descriptor intended to model the spatio-temporal behavior of a tracked object of interest in a general manner. The basic idea of the descriptor is the accumulation of histograms of an image function value through time. The histograms are calculated over a regular grid of patches inside the bounding box of the object and normalized to represent empirical probability distributions. The number of grid patches is fixed, so the descriptor is invariant to changes in spatial scale. Depending on the temporal complexity/details at hand, we introduce "first order STA descriptors" that describe the average distribution of a chosen image function over time, and "second order STA descriptors" that model the distribution of each histogram bin over time. We discuss entropy and χ 2 as well-suited similarity and saliency measures for our descriptors. Our experimental validation ranges from the patch-to the object-level. Our results show that STA, this simple, yet powerful novel description of local space-time appearance is well-suited to machine learning and will be useful in videoanalysis, including potential applications of object detection, tracking, and background modeling.