“…Therefore, the optimal SWG design policy , can be obtained following the Bellman's optimality criterion as follows: where the transition probability from s to when action a is taken, is represented as . Our proposal estimates this value that changes the patch sizes as a Q ‐learning task [9 ]. For the SWG design policy , the Q ‐value that maps the dimension‐state of each row of patches to the action of increasing or reducing its size ( s,a ), is defined as the expected discounted reward of taking the action a in the per‐row dimension state s , according to the design policy (5 ).…”