Discrete state models are a common tool of modeling in many areas. E.g., Markov state models as a particular representative of this model family became one of the major instruments for analysis and understanding of processes in molecular dynamics (MD). Here we extend the scope of discrete state models to the case of systematically missing scales, resulting in a nonstationary and nonhomogeneous formulation of the inference problem. We demonstrate how the recently developed tools of nonstationary data analysis and information theory can be used to identify the simultaneously most optimal (in terms of describing the given data) and most simple (in terms of complexity and causality) discrete state models. We apply the resulting formalism to a problem from molecular dynamics and show how the results can be used to understand the spatial and temporal causality information beyond the usual assumptions. We demonstrate that the most optimal explanation for the appropriately discretized/ coarse-grained MD torsion angles data in a polypeptide is given by the causality that is localized both in time and in space, opening new possibilities for deploying percolation theory and stochastic subgridscale modeling approaches in the area of MD.multiscale systems | probabilistic networks | Granger causality | nonstationarity | regularization D iscrete state modeling is a powerful tool in many areas of science such as in computational biophysics [where it is mostly used in a form of Markov state models (1-4)], materials science [e.g., deployed in percolation theory and Ising models (5)], bioinformatics [e.g., as probabilistic Boolean models for analysis and control of complex biological networks (6)], and geosciences [e.g., used in the form of the generalized linear regression models (7)]. A central issue of discrete state modeling is the identification of an optimal model for the discrete quantity of interest y (e.g., being a Boolean variable or a probability measure) expressed as a function of other available discrete quantities x 1 , x 2 , . . . , x n (being also Boolean variables or probability measures) and of all other potentially relevant quantities u (being discrete and/or continuous variables). Inference of causality then implies identification of all x i that have a statistically significant impact on y and distinguishing them from all those x j that are insignificant for y. To give a concrete example, in the context of molecular dynamics variable y may describe a probability for a certain torsion angle (e.g., from the protein backbone) to be in one of the discrete conformational states; x 1 , x 2 , . . . , x n can be the values of probabilities for all torsion angles of this protein in previous times and variable u may represent all of the positions and velocities of individual atoms, simulation settings (e.g., temperature), and force-field and solvent properties, etc. Understanding the causality in this situation will mean, for example, identification of the memory depth (e.g., in the context of Markov state models, where...