In this paper, we investigate the impact of a two-step Markov update scheme for the reinforcement component of XCS, a family of accuracybased learning classifier systems. We use a mathematical framework using discrete-time dynamical system theory to analyze the stability and convergence of the proposed method. We provide frequency domain analysis for classifier parameters to investigate the achieved improvement of the XCS algorithm, employing a two-step update rule in the transient and steady-state stages of learning. An experimental analysis is performed to learn to solve a multiplexer benchmark problem to compare the results of the proposed update rules with the original XCS. The results show faster convergence, better steady-state training accuracy and less sensitivity to variations in learning rates.