“…In these works, the fuzzy approach is applied either in the reward/cost function ( [2,14,15,25]) or in the dynamic of the system ( [14,16,17]), all of them under finite state and action spaces framework. In regards to the long-run expected average cost criterion, only the following two works were found: [10] and [13]. In [13], a Pareto optimal policy maximizing the average expected fuzzy reward under the max-order is found.…”