Pattern classification learning tasks are commonly used to explore learning strategies in human subjects. The universal and individual traits of learning such tasks reflect our cognitive abilities and have been of interest both psychophysically and clinically. From a computational perspective, these tasks are hard, because the number of patterns and rules one could consider even in simple cases is exponentially large. Thus, when we learn to classify we must use simplifying assumptions and generalize. Studies of human behavior in probabilistic learning tasks have focused on rules in which pattern cues are independent, and also described individual behavior in terms of simple, single-cue, feature-based models. Here, we conducted psychophysical experiments in which people learned to classify binary sequences according to deterministic rules of different complexity, including high-order, multicue-dependent rules. We show that human performance on such tasks is very diverse, but that a class of reinforcement learning-like models that use a mixture of features captures individual learning behavior surprisingly well. These models reflect the important role of subjects' priors, and their reliance on high-order features even when learning a low-order rule. Further, we show that these models predict future individual answers to a high degree of accuracy. We then use these models to build personally optimized teaching sessions and boost learning.inference | information | maximum entropy W e regularly learn to classify sensory stimuli into novel categories, often using an impoverished sampling of the stimulus space and the underlying rule: even a "simple" task of classifying patterns of n bits into two categories, requires an implicit mapping of the 2 n possible patterns, which means there are 2 2 n potential deterministic classification rules. It is clear then that when we learn to classify, we cannot simply explore the space of rules and patterns, but instead must rely on simplifying assumptions.Analysis of human learning of deterministic classification rules has focused on modeling of the average behavior of subjects (1-3), and explored the effect of rule complexity on the average level of success (4). Learning to classify according to a probabilistic rule is inherently ambiguous, and so studies of such tasks have focused on simpler rules than those used in deterministic classification. For example, the weather prediction (WP) task (5) requires learning probabilistic associations between multiple cues and a label, where each cue carries independent information about the correct label. Analysis of the learning strategy of individual subjects in this task has compared single-cue or single feature-based strategies, and the possibility of switching between such strategies (6, 7). Associative or Bayesian learning models that rely on simple stimulus features were used to describe the diversity of individual learning dynamics that subjects exhibited and compare between subjects (8), and reflected differences between healthy sub...