SOLO: Search Online, Learn Offline for Combinatorial Optimization Problems

Oren, Joel; Ross, Chana; Lefarov, Maksym; Richter, Felix; Taitler, Ayal; Feldman, Zohar; Daniel, Christian; Castro, Dotan Di

doi:10.1609/socs.v12i1.18556

Cited by 10 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Wu et al. [159], the AM [9] was adapted together with some RNN units to solve the time‐dependent TSPTW variant. In Oren et al.…”

Section: Vehicle Routing Problemsmentioning

confidence: 99%

“…In Oren et al. [160], the deep Q‐learning was explored together with an improved Monte Carlo Tree Search (MCTS) to solve online CVRP. However, the problem scales of the aforementioned works remain rather small.…”

Section: Vehicle Routing Problemsmentioning

confidence: 99%

“…In Wu et al [159], the AM [9] was adapted together with some RNN units to solve the time-dependent TSPTW variant. In Oren et al [160], the deep Q-learning was explored together with an improved Monte Carlo Tree Search (MCTS) to solve online CVRP. However, the problem scales of the aforementioned works remain rather small.…”

Section: Online/dynamic Routing Problemsmentioning

confidence: 99%

“…However, for NP‐hard routing problems, constructing solutions can easily fall into the local minimum, which renders the needs of combining other post‐processing techniques, such as: (1) sampling (e.g. in refs [8, 9, 143, 144, 150, 152, 154–159, 167]) that repeats the stochastic construction process for multiple times, (2) beam search (e.g. in refs [148]) that performs an additional best‐first search process, (3) AS (e.g.…”

Section: Vehicle Routing Problemsmentioning

confidence: 99%

See 3 more Smart Citations

A review on learning to solve combinatorial optimisation problems in manufacturing

Zhang

et al. 2023

IET Collab Intel Manufact

View full text Add to dashboard Cite

An efficient manufacturing system is key to maintaining a healthy economy today. With the rapid development of science and technology and the progress of human society, the modern manufacturing system is becoming increasingly complex, posing new challenges to both academia and industry. Ever since the beginning of industrialisation, leaps in manufacturing technology have always accompanied technological breakthroughs from other fields, for example, mechanics, physics, and computational science. Recently, machine learning (ML) technology, one of the crucial subjects of artificial intelligence, has made remarkable progress in many areas. This study thoroughly reviews how ML, specifically deep (reinforcement) learning, motivates new ideas for addressing challenging problems in manufacturing systems. We collect the literature targeting three aspects: scheduling, packing, and routing, which correspond to three pivotal cooperative production links of today's manufacturing system, that is, production, packing, and logistics respectively. For each aspect, we first present and discuss the state-of-the-art research. Then we summarise and analyse the development trends and point out future research opportunities and challenges. K E Y W O R D Sbin packing, combinatorial optimisation, deep reinforcement learning, job shop scheduling, manufacturing systems, vehicle routing | INTRODUCTIONCombinatorial optimisation problems (COPs), as one important branch of mathematical optimisation, have practical applications in many fields, such as communication, transportation, manufacturing and aroused broad research in industrial engineering, computer science, and operations research. Due to the NP (non-deterministic polynomial-time) hardness, finding their optimal solutions is challenging. In specific, the discrete solution space in COPs renders the optimisation less efficient, without the guidance of gradient as in continuous optimisation. Meanwhile, the complexity of searching the (near-)optimal solution(s) among feasible solutions could exponentially increase as the problem scale grows. Classic methods, including exact algorithms and (meta-)heuristics, generally depend on massive expertise and tuning work to solve specific problems. They are Cong Zhang, Yaoxin Wu, and Yining Ma are equal contribution.

show abstract

“…In Wu et al. [159], the AM [9] was adapted together with some RNN units to solve the time‐dependent TSPTW variant. In Oren et al.…”

Section: Vehicle Routing Problemsmentioning

confidence: 99%

Section: Vehicle Routing Problemsmentioning

confidence: 99%

Section: Online/dynamic Routing Problemsmentioning

confidence: 99%

Section: Vehicle Routing Problemsmentioning

confidence: 99%

See 2 more Smart Citations

A review on learning to solve combinatorial optimisation problems in manufacturing

Zhang

et al. 2023

IET Collab Intel Manufact

View full text Add to dashboard Cite

show abstract

“…Algorithms such as Taboo search [Taillard, 1994], simulated Annealing [Van Laarhoven et al, 1992], genetic algorithms and particle swarm optimization [Pezzella et al, 2008] have proven to solve the problem, but lack in either computation time or generalization capabilities. Advancements in DRL approaches in recent years have enabled considerable progress for the domain of COP applications [Cappart et al, 2021, Oren et al, 2021. Some of the major COPs have been successfully solved using DRL such as the Travelling Salesman Problem (TSP) [Zhang et al, 2021, d O Costa et al, 2020, Zhang et al, 2020b, the Knap Sack Problem [Afshar et al, 2020, Cappart et al, 2021 and the Steiner Tree Problem [Du et al, 2021].…”

Section: Related Workmentioning

confidence: 99%

A Reinforcement Learning Approach for Scheduling Problems With Improved Generalization Through Order Swapping

Vivekanandan¹,

Wirth²,

Karlbauer³

et al. 2023

Preprint

View full text Add to dashboard Cite

The scheduling of production resources (such as associating jobs to machines) plays a vital role for the manufacturing industry not only for saving energy but also for increasing the overall efficiency. Among the different job scheduling problems, the Job Shop Scheduling Problem (JSSP) is addressed in this work. JSSP falls into the category of NP-hard Combinatorial Optimization Problem (COP), in which solving the problem through exhaustive search becomes unfeasible. Simple heuristics such as First In, First Out (FIFO), Largest Processing Time First (LPT) and metaheuristics such as Taboo search are often adopted to solve the problem by truncating the search space. The viability of the methods becomes inefficient for large problem sizes as it is either far from the optimum or time consuming. In recent years, the research towards using Deep Reinforcement Learning (DRL) to solve COPs has gained interest and has shown promising results in terms of solution quality and computational efficiency. In this work, we provide an novel approach to solve the JSSP examining the objectives generalization and solution effectiveness using DRL. In particular, we employ the Proximal Policy Optimization (PPO) algorithm that adopts the policy-gradient paradigm that is found to perform well in the constrained dispatching of jobs. We incorporated an Order Swapping Mechanism (OSM) in the environment to achieve better generalized learning of the problem. The performance of the presented approach is analyzed in depth by using a set of available benchmark instances and comparing our results with the work of other groups.

show abstract