We examined the neural signature of directed exploration by contrasting MEG beta(16-30 Hz) power changes between disadvantageous and advantageous choices in the two-choice probabilistic reward task. Both types of choices were made when our participants learned the probabilistic contingency between choices and their outcomes, i.e., acquired the inner model of choice value. Therefore, rare disadvantageous choices might serve exploratory, environment-probing purposes. The study brought two main findings. Firstly, decision making leading to disadvantageous choices took more time and evidenced greater large-scale suppression of beta oscillations than its advantageous alternative. Additional neural resources required by disadvantageous decisions strongly suggest their deliberately explorative nature. Secondly, an outcome of disadvantageous and advantageous choices had qualitatively different impact on feedback-related beta oscillations. Only losses, but not gains, resulting from the disadvantageous choice were followed by late beta synchronization in frontal cortex. Our results are consistent with the role of frontal beta oscillations in the stabilization of neural representations for selected behavioral rule when exploratory strategy conflicts with value-based behavior. Punishment for exploratory choice being congruent with its low value in the reward history is more likely to strengthen, through punishment-related beta oscillations, the representation of its competitor - the inner utility model.