Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper bounds on the rates at which the number and the coding cost of nearly maximally predictive features scale with desired predictive power. The rates are determined by the fractal dimensions of a process' mixed-state distribution. These results, in turn, show how widely used finite-order Markov models can fail as predictors and that mixed-state predictive features can offer a substantial improvement. DOI: 10.1103/PhysRevE.95.051301 Often, we wish to find a minimal maximally predictive model consistent with available data. Perhaps we are designing interactive agents that reap greater rewards by developing a predictive model of their environment [1][2][3][4][5][6] or, perhaps, we wish to build a predictive model of experimental data because we believe that the resultant model gives insight into the underlying mechanisms of the system [7,8]. Either way, we are almost always faced with constraints that force us to efficiently compress our data [9].Ideally, we would compress information about the past without sacrificing any predictive power. For stochastic processes generated by finite unifilar hidden Markov models (HMMs), one need only store a finite number of predictive features. The minimal such features are called causal states, their coding cost is the statistical complexity C Ī¼ [10], and the implied unifilar HMM is the -machine [10,11]. However, most processes require an infinite number of causal states [7] and so cannot be described by finite unifilar HMMs.In these cases, we can only attain some maximal level of predictive power given constraints on the number of predictive features or their coding cost. Equivalently, from finite data we can only infer a finite predictive model. Thus, we need to know how our predictive power grows with available resources.Recent work elucidated the tradeoffs between resource constraints and predictive power for stochastic processes generated by countable unifilar HMMs or, equivalently, described by a finite or countably infinite number of causal states [12][13][14]. Few, though, studied this tradeoff or provided bounds thereof more generally.Here, we place bounds on resource-prediction tradeoffs in the limit of nearly maximal predictive power for processes with either a countable or an uncountable infinity of causal states by coarse-graining the mixed-state simplex [15]. These bounds give an operational interpretation to the fractal dimension of the mixed-state simplex and suggest routes towards quantifying the memory stored in a stochastic process when, as is typical, statistical complexity diverges. * semarzen@mit.edu ā chaos@ucdavis.eduBackground. We consider a discrete-time, discrete-state stochastic process P generated by an HMM G, which com...