Deep Reinforcement Learning Approach to Solve Dynamic Vehicle Routing Problem with Stochastic Customers

Self Cite

Police patrol aims to fulfill two main objectives namely to project presence and to respond to incidents in a timely manner. Incidents happen dynamically and can disrupt the initially-planned patrol schedules. The key decisions to be made will be which patrol agent to be dispatched to respond to an incident and subsequently how to adapt the patrol schedules in response to such dynamically-occurring incidents whilst still fulfilling both objectives; which sometimes can be conflicting. In this paper, we define this real-world problem as a Dynamic Bi-Objective Police Patrol Dispatching and Rescheduling Problem and propose a solution approach that combines Deep Reinforcement Learning (specifically neural networks-based Temporal-Difference learning with experience replay) to approximate the value function and a rescheduling heuristic based on ejection chains to learn both dispatching and rescheduling policies jointly. To address the dual objectives, we propose a reward function that implicitly tries to maximize the rate of successfully responding to an incident within a response time target while minimizing the reduction in patrol presence without the need to explicitly set predetermined weights for each objective. The proposed approach is able to compute both dispatching and rescheduling decisions almost instantaneously. Our work serves as the first work in the literature that takes into account these dual patrol objectives and real-world operational consideration where incident response may disrupt existing patrol schedules.

Section: Methodsmentioning

confidence: 99%

Section: Reinforcement Learning Approach To Solve Routing and Schedul...mentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement Learning Approach to Solve Dynamic Bi-objective Police Patrol Dispatching and Rescheduling Problem

Joe

Lau

Pan

2022

Self Cite

“…Gombolay et al (2018) highlight the value of incorporating human expertise and heuristics in RL performance on VRPs. Joe and Lau (2020) show that combining RL and a meta-heuristic is effective for centralized solving of dynamic VRP problems. In the SOL domain, Irannezhad, Prato, and Hickman (2020) successfully incorporate a multi-agent RL solution into a full-fledged port decision support system and evaluate agent collaboration strategies, showing that a cooperative strategy, rather than one focused on individual reward maximization, results in the highest overall vehicle utilization and lowest travel distance and costs.…”

Section: Reinforcement Learningmentioning

confidence: 98%

Talking Trucks: Decentralized Collaborative Multi-Agent Order Scheduling for Self-Organizing Logistics

Pingen¹,

Ommeren²,

Leeuwen³

et al. 2022

Logistics planning is a complex optimization problem involving multiple decision makers. Automated scheduling systems offer support to human planners; however state-of-the-art approaches often employ a centralized control paradigm. While these approaches have shown great value, their application is hindered in dynamic settings with no central authority. Motivated by real-world scenarios, we present a decentralized approach to collaborative multi-agent scheduling by casting the problem as a Distributed Constraint Optimization Problem (DCOP). Our model-based heuristic approach uses message passing with a novel pruning technique to allow agents to cooperate on mutual agreement, leading to a near-optimal solution while offering low computational costs and flexibility in case of disruptions. Performance is evaluated in three real-world field trials with a logistics carrier and compared against a centralized model-free Deep Q-Network (DQN)-based Reinforcement Learning (RL) approach, a Mixed-Integer Linear Programming (MILP)-based solver, and both human and heuristic baselines. The results demonstrate that it is feasible to have virtual agents make autonomous decisions using our DCOP method, leading to an efficient distributed solution. To facilitate further research in Self-Organizing Logistics (SOL), we provide a novel real-life dataset.

“…Because RL has the ability to adapt to dynamic changes in the workload (environment) and handle the non-trivial consequences of chosen policies (actions), it is a good fit for the problems encountered in this paper. We consider cache replacement as a decision-making problem for choosing different replacement policies given the corresponding workload distribution (Joe and Lau 2020). At the same time, to describe the differences between different workloads, we use a neural network (NN) to represent the diverse workload distributions.…”

Section: Deep Reinforcement Learningmentioning

confidence: 99%

An End-to-End Automatic Cache Replacement Policy Using Deep Reinforcement Learning

Zhang¹,

Wang²,

Shi³

et al. 2022

In the past few decades, much research has been conducted on the design of cache replacement policies. Prior work frequently relies on manually-engineered heuristics to capture the most common cache access patterns, or predict the reuse distance and try to identify the blocks that are either cache-friendly or cache-averse. Researchers are now applying recent advances in machine learning to guide cache replacement policy, augmenting or replacing traditional heuristics and data structures. However, most existing approaches depend on the certain environment which restricted their application, e.g, most of the approaches only consider the on-chip cache consisting of program counters (PCs). Moreover, those approaches with attractive hit rates are usually unable to deal with modern irregular workloads, due to the limited feature used. In contrast, we propose a pervasive cache replacement framework to automatically learn the relationship between the probability distribution of different replacement policies and workload distribution by using deep reinforcement learning. We train an end-to-end cache replacement policy only on the past requested address through two simple and stable cache replacement policies. Furthermore, the overall framework can be easily plugged into any scenario that requires cache. Our simulation results on 8 production storage traces run against 3 different cache configurations confirm that the proposed cache replacement policy is effective and outperforms several state-of-the-art approaches.