“…Pointer network [66] Supervised, Approximation GCN + Search [31] Supervised, Approximation Q-Learning + GNN [19] Model-free, Value-based Hierarchical RL + GAT [44] Model-free, Policy-based REINFORCE + LSTM with attention [47] Model-free, Policy-based REINFORCE + attention [20] Model-free, Policy-based RL + GAT [36] Model-free, Policy-based DDPG [23] Model-free, Policy-based REINFORCE + Pointer network [10] Model-free, Policy-based RL + NN [45] Model-free, Actor-Critic RL + GAT [14] Model-free, Actor-Critic AlphaZero: MCTS + GCN [51] Model-based, Given model Knapsack Problem REINFORCE + Pointer network [10] Model-free, Policy-based Bin Packing Problem (BPP) REINFORCE + LSTM [29] Model-free, Policy-based AlphaZero: MCTS + NN [38] Model-based, Given model Job Scheduling Problem (JSP) RL + LSTM [16] Model-free, Actor-Critic Vehicle Routing Problem (VRP) REINFORCE + LSTM with attention [47] Model-free, Policy-based RL + LSTM [16] Model-free, Policy-based RL + GAT [36] Model-free, Policy-based RL + NN [43] Model-free, Policy-based RL + GAT [25] Model-free, Actor-Critic Global Routing DQN + MLP [40] Model-free, Value-based Highest Safe Rung (HSR) AlphaZero: MCTS + CNN [71] Model-based, Given model Table 3: Classification of ML approaches for NP-hard combinatorial optimization by problem, method, and type.…”