Reversal learning paradigms are widely-used assays of behavioral flexibility with their probabilistic versions being more amenable to studying integration of reward outcomes over time. Prior research suggests differences between initial learning and learning following reversals including higher learning rates, a greater need for inhibitory control, and more perseveration after reversals. However, it is not well-understood what aspects of stimulus-based reversal learning are unique to reversals, and whether and how differences between initial and post-reversal learning depend on reward probability. Here, we used a visual probabilistic discrimination and reversal learning paradigm during which male and female rats selected between a pair of stimuli associated with different reward probabilities. We compared various measures of accuracy, rewards collected, omissions, latencies, win-stay/lose-shift strategies, and indices of perseveration between two different reward probability schedules. We found that discrimination (pre-reversal) and reversal learning are behaviorally more unique than similar: longer choice latencies following incorrect trials, lesser win-stay and lose-shift strategies employed, and more perseveration in early reversal learning. Additionally, fit of choice behavior using reinforcement learning models revealed a lower sensitivity to the difference in subjective reward values (greater exploration) and higher learning rates for the reversal phase. Interestingly, a consistent reward probability group difference emerged with a richer environment associated with longer reward collection latencies than a leaner environment. We also replicated previous reports on sex differences in reversal learning. Future studies should systematically compare the neural correlates of fine-grained behavioral measures to reveal possible dissociations in how the circuitry is recruited in each phase.