“…Wu et al (2020) also test the performance of their multiagent RL-TSC algorithm under different penetration rates. Interestingly, different conclusions on RL agents' performance under low penetration rates are drawn from these work: Aziz et al (2019) find that their RL-TSC systems can not learn well under penetration rates below 40%; but Zhang et al (2020) and Wu et al (2020) demonstrate the robustness of their proposed RL-TSC methods against low penetration rates, for instance, Zhang et al (2020) show that their RL-TSC system leads to an 80% decrease in waiting time at the 20% penetration rate, comparing with its performance at 100% penetration rate. Key features of these studies as well as the proposed method of this paper are summarized in Table 2, including their traffic information source, environment, agent, algorithm, benchmarks, state, action, reward, and gap, respectively.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 90%
“…Despite an increasing number of papers on RL-TSC using CV data published in recent years (Kim et al, 2019;Hussain et al, 2020;Yan et al, 2020;Liu et al, 2014Liu et al, , 2017, only a few pay attention to the performance of proposed RL-TSC systems under low penetration rate scenarios. Aziz et al (2019) evaluate their previous RL-TSC system (Al Islam et al, 2018) under various penetration rates in two network-level real-world case studies. In Zhang et al (2020), a more comprehensive series of experiments on the proposed RL-TSC system under different penetration rates and traffic demand patterns are conducted at a synthetic isolated intersection.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 99%
“…The difference in the performance of RL-TSC algorithms at low penetration rates can be explained by the design of RL-TSC system and experiments. As aforementioned, Zhang et al (2020) use non-CV delay as part of reward during training, but there is no non-CV information used in the RL-TSC system in Aziz et al (2019) -state and reward are only computed from CV data. Instead, CV data in Aziz et al (2019) is utilized to estimate queue lengths in a simplified way: the distance between the position of the last stopped CV to the stop line is assumed to be the queue length of that lane.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 99%
“…As aforementioned, Zhang et al (2020) use non-CV delay as part of reward during training, but there is no non-CV information used in the RL-TSC system in Aziz et al (2019) -state and reward are only computed from CV data. Instead, CV data in Aziz et al (2019) is utilized to estimate queue lengths in a simplified way: the distance between the position of the last stopped CV to the stop line is assumed to be the queue length of that lane. In addition, RL-TSC agents in Aziz et al (2019) are trained at the 100% penetration rate but tested under various penetration rates, which may not generalize well.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 99%
“…Instead, CV data in Aziz et al (2019) is utilized to estimate queue lengths in a simplified way: the distance between the position of the last stopped CV to the stop line is assumed to be the queue length of that lane. In addition, RL-TSC agents in Aziz et al (2019) are trained at the 100% penetration rate but tested under various penetration rates, which may not generalize well. As to Wu et al (2020), an recurrent neural network (RNN) is applied, which can learn historic information from time-continuous traffic state data.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
This paper develops a reinforcement learning (RL) scheme for adaptive traffic signal control (ATSC), called "CVLight", that leverages data collected only from connected vehicles (CV). Seven types of RL models are proposed within this scheme that contain various state and reward representations, including incorporation of CV delay and green light duration into state and the usage of CV delay as reward. To further incorporate information of both CV and non-CV into CVLight, an algorithm based on actorcritic, A2C-Full, is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to update the policy network and execute optimal signal timing. These models are compared at an isolated intersection under various CV market penetration rates. A full model with the best performance (i.e., minimum average travel delay per vehicle) is then selected and applied to compare with state-of-the-art benchmarks under different levels of traffic demands, turning proportions, and dynamic traffic demands, respectively. Two case studies are performed on an isolated intersection and a corridor with three consecutive intersections located in Manhattan, New York, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models that use all vehicle information, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and can achieve a similar or even greater performance when the CV penetration rate is no less than 20%.
“…Wu et al (2020) also test the performance of their multiagent RL-TSC algorithm under different penetration rates. Interestingly, different conclusions on RL agents' performance under low penetration rates are drawn from these work: Aziz et al (2019) find that their RL-TSC systems can not learn well under penetration rates below 40%; but Zhang et al (2020) and Wu et al (2020) demonstrate the robustness of their proposed RL-TSC methods against low penetration rates, for instance, Zhang et al (2020) show that their RL-TSC system leads to an 80% decrease in waiting time at the 20% penetration rate, comparing with its performance at 100% penetration rate. Key features of these studies as well as the proposed method of this paper are summarized in Table 2, including their traffic information source, environment, agent, algorithm, benchmarks, state, action, reward, and gap, respectively.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 90%
“…Despite an increasing number of papers on RL-TSC using CV data published in recent years (Kim et al, 2019;Hussain et al, 2020;Yan et al, 2020;Liu et al, 2014Liu et al, , 2017, only a few pay attention to the performance of proposed RL-TSC systems under low penetration rate scenarios. Aziz et al (2019) evaluate their previous RL-TSC system (Al Islam et al, 2018) under various penetration rates in two network-level real-world case studies. In Zhang et al (2020), a more comprehensive series of experiments on the proposed RL-TSC system under different penetration rates and traffic demand patterns are conducted at a synthetic isolated intersection.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 99%
“…The difference in the performance of RL-TSC algorithms at low penetration rates can be explained by the design of RL-TSC system and experiments. As aforementioned, Zhang et al (2020) use non-CV delay as part of reward during training, but there is no non-CV information used in the RL-TSC system in Aziz et al (2019) -state and reward are only computed from CV data. Instead, CV data in Aziz et al (2019) is utilized to estimate queue lengths in a simplified way: the distance between the position of the last stopped CV to the stop line is assumed to be the queue length of that lane.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 99%
“…As aforementioned, Zhang et al (2020) use non-CV delay as part of reward during training, but there is no non-CV information used in the RL-TSC system in Aziz et al (2019) -state and reward are only computed from CV data. Instead, CV data in Aziz et al (2019) is utilized to estimate queue lengths in a simplified way: the distance between the position of the last stopped CV to the stop line is assumed to be the queue length of that lane. In addition, RL-TSC agents in Aziz et al (2019) are trained at the 100% penetration rate but tested under various penetration rates, which may not generalize well.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
confidence: 99%
“…Instead, CV data in Aziz et al (2019) is utilized to estimate queue lengths in a simplified way: the distance between the position of the last stopped CV to the stop line is assumed to be the queue length of that lane. In addition, RL-TSC agents in Aziz et al (2019) are trained at the 100% penetration rate but tested under various penetration rates, which may not generalize well. As to Wu et al (2020), an recurrent neural network (RNN) is applied, which can learn historic information from time-continuous traffic state data.…”
Section: Reinforcement Learning Based Tsc With Partial CV Informationmentioning
This paper develops a reinforcement learning (RL) scheme for adaptive traffic signal control (ATSC), called "CVLight", that leverages data collected only from connected vehicles (CV). Seven types of RL models are proposed within this scheme that contain various state and reward representations, including incorporation of CV delay and green light duration into state and the usage of CV delay as reward. To further incorporate information of both CV and non-CV into CVLight, an algorithm based on actorcritic, A2C-Full, is proposed where both CV and non-CV information is used to train the critic network, while only CV information is used to update the policy network and execute optimal signal timing. These models are compared at an isolated intersection under various CV market penetration rates. A full model with the best performance (i.e., minimum average travel delay per vehicle) is then selected and applied to compare with state-of-the-art benchmarks under different levels of traffic demands, turning proportions, and dynamic traffic demands, respectively. Two case studies are performed on an isolated intersection and a corridor with three consecutive intersections located in Manhattan, New York, to further demonstrate the effectiveness of the proposed algorithm under real-world scenarios. Compared to other baseline models that use all vehicle information, the trained CVLight agent can efficiently control multiple intersections solely based on CV data and can achieve a similar or even greater performance when the CV penetration rate is no less than 20%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.