Recent works using deep learning to solve routing problems such as the traveling salesman problem (TSP) have focused on learning construction heuristics. Such approaches find good quality solutions but require additional procedures such as beam search and sampling to improve solutions and achieve state-of-the-art performance. However, few studies have focused on improvement heuristics, where a given solution is improved until reaching a near-optimal one. In this work, we propose to learn a local search heuristic based on 2-opt operators via deep reinforcement learning. We propose a policy gradient algorithm to learn a stochastic policy that selects 2-opt operations given a current solution. Moreover, we introduce a policy neural network that leverages a pointing attention mechanism, which can be easily extended to more general k-opt moves. Our results show that the learned policies can improve even over random initial solutions and approach near-optimal solutions faster than previous state-of-the-art deep learning methods for the TSP. We also show we can adapt the proposed method to two extensions of the TSP: the multiple TSP and the Vehicle Routing Problem, achieving results on par with classical heuristics and learned methods.
We propose a Genetic Algorithm (GA) to address a Green Vehicle Routing Problem (G-VRP). Unlike classic formulations of the VRP, this study aims to minimise the CO 2 emissions per route. The G-VRP is of interest to policy makers who wish to reduce greenhouse gas emissions. The GA is tested on a suite of benchmark, and real-world instances which include road speed and gradient data. Our solution approach incorporates elements of local and population search heuristics. Solutions are compared with routes currently used by drivers in a courier company. Reductions in emissions are achieved without incurring additional operational costs.
In this paper we study a repeated posted-price auction between a single seller and a single buyer that interact for a finite number of periods or rounds. In each round, the seller offers the same item for sale to the buyer. The seller announces a price and the buyer can decide to buy the item at the announced price or the buyer can decide not to buy the item. In this paper we study the problem from the perspective of the buyer who only gets to observe a stochastic measurement of the valuation of the item after he buys the item. Furthermore, in our model the buyer uses fuzzy sets to describe his satisfaction with the observed valuations and he uses fuzzy sets to describe his dissatisfaction with the observed price. In our problem, the buyer makes decisions based on the probability of a fuzzy event. His decision to buy or not depends on whether the satisfaction from having a high enough valuation for the item out weights the dissatisfaction of the quoted price. We propose an algorithm based on Thompson Sampling and demonstrate that it performs well using numerical experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.