“…More in detail, the algorithm uses a single step computation of the neural network to approximate the performance index which will be obtained by iterating the dynamic programming algorithm. The method provides us with a feasible and effective way to address many optimal control problems; examples can be found in the cart-pole control [13,20], pendulum robot upswing control [26], urban intersection traffic signal control [15], freeway ramp metering [6,27], play Go-Moku [28], and so on. However, the learning inefficiency of R-L is also inherited in ADP but can also be remedied with a supervisor to formulate SADP.…”