“…In general terms, the definitions and representations of state space in existing papers (e.g., total number of queued vehicles [12, 19-21, 27, 29], length of queued vehicles [12], speed of vehicles [11,18,23,27], or traffic flow [15,30]) can be modified to relay more effective information about the environment, which leads to more accurate judgments about the actions. e action space has been defined as all available signal phases [11,18,20,27,30,31], or alternatively, it has been defined to maintain a sequence [22]. As for the definition of a reward function, most studies choose a reduction in the travel time of a vehicle [11,22,23], length of a vehicle queue [13,15], or the time delay in queuing [11,19,20,26,28,30].…”