A number of studies on texture and ensemble perception have shown that humans can immediately estimate the average of spatially distributed visual information. the present study characterized mechanisms involved in estimating averages for information distributed over both space and time. observers viewed a rapid sequence of texture patterns in which elements' orientation were determined by dynamic Gaussian noise with variable spatial and temporal standard deviations (SDs). We found that discrimination thresholds increased beyond a certain spatial SD if temporal SD was small, but if temporal SD was large, thresholds remained nearly constant regardless of spatial SD. these data are at odds with predictions that threshold is uniquely determined by spatiotemporal SD. Moreover, a reverse correlation analysis revealed that observers judged the spatiotemporal average orientation largely depending on the spatial average orientation over the last few frames of the texture sequence -a recency effect widely observed in studies of perceptual decision making. Results are consistent with the notion that the visual system rapidly computes spatial ensembles and adaptively accumulates information over time to make a decision on spatiotemporal average. A simple computational model based on this notion successfully replicated observed data.Humans achieve stable perception of scenes and objects at a glance in spite of the spatial complexity and uncertainty of the natural image. While such perception seems to involve highly complicated and specialized neural processing, recent research has shown that perception builds upon image statistics computed relatively easily in the early stages of visual processing 1-4 . A vast psychophysical literature has suggested that the visual system is capable of rapidly estimating the characteristics of an ensemble of complex elements (e.g., objects, faces) 5-8 as well as discriminating textures defined by simple visual features such as form, color and motion 9-16 . These studies offer clear evidence that the visual system automatically extracts a statistical representation of the spatial properties of the image. Such statistical visual representations are thought to be subserved by neural mechanisms in early visual cortex with large spatial receptive field or cortico-cortical interactions 17-23 .Visual inputs inherently contain much temporal uncertainty owing to gaze shifts and object motions, and little is known about how mechanisms extracting spatial statistics cope with such temporal uncertainty. Psychophysical studies have examined how performance for orientation discrimination or global form detection in dynamic texture patterns varies as a function of stimulus duration. Results revealed that temporal summation is relatively short (over a few hundred milliseconds) and consistent with the idea that spatial statistics are computed rapidly by low-level mechanisms 24,25 . However, experiments using stochastic motion stimuli have shown that detecting global and biological motion requires a much longe...