TY - GEN
T1 - Time critic policy gradient methods for traffic signal control in complex and congested scenarios
AU - Rizzo, Stefano Giovanni
AU - Vantini, Giovanna
AU - Chawla, Sanjay
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/7/25
Y1 - 2019/7/25
N2 - Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.
AB - Employing an optimal traffic light control policy has the potential of having a positive impact, both economic and environmental, on urban mobility. Reinforcement learning techniques have shown promising results in optimizing control policies for basic intersections and low volume traffic. This paper addresses the traffic light control problem in a complex scenario, such as a signalized roundabout with heavy traffic volumes, with the aim of maximizing throughput and avoiding traffic jams. We formulate the environment with a realistic representation of states and actions and a capacity-based reward. We enforce episode terminal conditions to avoid unwanted states, such as long queues interfering with other junctions in the vehicular network. A time-dependent baseline is proposed to reduce the variance of Policy Gradient updates in the setting of episodic conditions, thus improving the algorithm convergence to an optimal solution. We evaluate the method on real data and highly congested traffic, implementing a signalized simulated roundabout with 11 phases. The proposed method is able to avoid traffic jams and achieves higher performance than traditional time-splitting policies and standard Policy Gradient on average delay and effective capacity, while drastically decreasing the emissions.
KW - Policy gradient
KW - Reinforcement learning
KW - Roundabout modeling
KW - Traffic light control
UR - http://www.scopus.com/inward/record.url?scp=85071199280&partnerID=8YFLogxK
U2 - 10.1145/3292500.3330988
DO - 10.1145/3292500.3330988
M3 - Conference contribution
AN - SCOPUS:85071199280
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 1654
EP - 1664
BT - KDD 2019 - Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2019
Y2 - 4 August 2019 through 8 August 2019
ER -