Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2019
DOI: 10.1145/3292500.3330949
|View full text |Cite
|
Sign up to set email alerts
|

PressLight

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
42
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 175 publications
(43 citation statements)
references
References 15 publications
0
42
0
1
Order By: Relevance
“…We have applied five methods, including Fixed-time method [20], Q-learning (Center) [46], Q-learning (Edge) [46], Nash Q-learning [47], and our MAAC method. They were all trained in 1000 episodes in CityFLow [44] based on edge computing architecture as we designed. As shown in Figure 12, we can see that the the algorithms converged at around 600 episode point.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We have applied five methods, including Fixed-time method [20], Q-learning (Center) [46], Q-learning (Edge) [46], Nash Q-learning [47], and our MAAC method. They were all trained in 1000 episodes in CityFLow [44] based on edge computing architecture as we designed. As shown in Figure 12, we can see that the the algorithms converged at around 600 episode point.…”
Section: Resultsmentioning
confidence: 99%
“…We apply an open source simulator for traffic environment: CityFlow [44] as our experiment environment. We assumed that there are six traffic lights (intersection nodes or edge computing nodes) in one section of a city (as shown in Figure 11).…”
Section: Simulation Environment and Settingsmentioning
confidence: 99%
See 1 more Smart Citation
“…The RL-based traffic signal control methods can be divided into three categories depending on its control areas: single intersection traffic signal control, arterial traffic signal control, and network traffic signal control. The RL-based methods were reported to have better performance than fixed-time and actuated methods regardless of their control areas [19][20][21][22]. However, these studies consider only unimodal traffic in reward functions, limiting their capabilities to model heterogeneous interests, and complex interactions of different traffic modes at intersections.…”
Section: Related Workmentioning
confidence: 99%
“…However, different local optima in the policy space can correspond to strategies that differ in nature, which makes the above consensus problematic in RL tasks where the environment is unstable. For example, in adaptive traffic signal control (ATSC) [81,89,90] (conceptual diagram and more examples are included in Figure 1), if two traffic flows are desired to reach the target points from the departure points quickly, multiple control strategies with similar average commuting times may exist due to the combinatorial nature of traffic lights. The performance of a single policy obtained by reward maximization is bound to be affected if the subsequent traffic volumes on other sections of the road network associated with the traveled section of that traffic change.…”
Section: Introductionmentioning
confidence: 99%