The cognitive radio (CR) system in which multiple secondary users (SU) search for spectrum opportunities generated by the absence of primary users (PU) is considered in this paper. The occupancy of a CR channel is modeled as a Markov chain, and it is assumed that the Markov chain has only two states: idle or busy. Since parameters of the Markov chain are unknown to SUs a priori and the states transit independently of the sensing and utilization of SUs, this problem can be considered as a kind of RMAB (restless multi-armed bandit) problem. We propose an efficient channel allocation algorithm for SUs, which is constructed through combination of multiple single-user MAB policies. When a performance of the proposed algorithm is measured by regret which is defined as the total reward difference from the ideal Bayesian policy in which the stationary probability is known to SUs, the order of regret growth of the proposed algorithm seems to be negatively decreasing, giving a better performance than any other existing policy under the 2-state Markov chain case. In order to estimate the performance of the proposed algorithm appropriately, we introduce a new definition of the regret, which uses a belief vector based Bayesian policy as the ideal policy. We observe experimentally that the order of the newly defined regret in the proposed algorithm is similar to logarithmic order under certain conditions.