“…Relying on learning mechanism, RL in its typical form does not require knowledge of a structure of the problem. Therefore, RL has been studied in wide range of sequential decision problems, for example, virtual machine configuration [4], robotics [5], helicopter control [6], ventilation, heating and air conditioning control [7], electricity trade [8], financial management [9], water resource management [10], and inventory management [11]. Acceptance of RL is credited to RL's effectiveness, potential possibilities [12], link to mammal learning processes [13,14], and its model-free property [15].…”