“…In our work, we opt for a more *This work is supported, in part, by "3D Bin Packing with Deep Reinforcement Learning" project funded by Hyundai Robotics Co. Ltd., in part, by "Edge Brain Based Intelligent Manufacturing" project IITP-2022-0-00067, in part, by AI Graduate School Program, Grant No.2019-0-00421, and by ICT Consilience Program, IITP-2020-0-01821, of the Institute of Information and Communication Technology Planning Evaluation (IITP), sponsored by the Korean Ministry of Science and Information Technology (MSIT). 1 Authors from the Artificial Intelligence School, Sungkyunkwan University, Suwon, South Korea, * Corresponding author: Sukhan Lee Lsh1@skku.edu practical definition based on [2], where the decisions are irreversible and items are delivered in sequence one by one (online), such that we give special attention to the immediate items B ⊂ I. In practice, a conveyor belt carries the item sequence to a robotic arm located at the head of the line.…”