Abstract. We analyze the problem of processing of very large datasets on parallel systems and find that the natural approaches to parallelization fail for two reasons. One is connected to long-range correlations between data and the other comes from nonscalar nature of the data. To overcome those difficulties the new paradigm of the data processing is proposed, based on a statistical simulation of the datasets, which in its turn for different types of data is realized on three approaches -decomposition of the statistical ensemble, decomposition on the base of principle of mixing and decomposition over the indexing variable. Some examples of proposed approach show its very effective scaling.