Formal methods in robotic motion planning have emerged as a hot research topic recently due to its correctby-design nature, and most results haven been based on nonprobabilistic discrete models. To better handle the environment uncertainties, sensor noise and actuator imperfection, control problems in probabilistic systems like Markov Chain (MC) and Markov Decision Process (MDP) have also been studied. Most existing methods are either based on probabilistic model checking or through reinforcement learning oriented optimization. On the other hand, in the literature of supervisory control of discrete event systems, people usually design supervisors with maximum permissive nature. In other words, a collection of schedulers, instead of a single one scheduler, that satisfy the given specification is designed at the same time. We are therefore motivated to propose a novel learning based automated supervisor synthesis framework to automatically generate permissive supervisor so that the supervised system satisfies the given specification. Our approach is based on a modified L* learning algorithm and runs iteratively. It is guaranteed to be correct and terminate in finite steps.