Deep neural networks have been demonstrated to be useful in varieties of intelligent tasks, and various specialized NN accelerators have been proposed recently to improve the hardware efficiency, which are typically equipped with software-managed scratchpad memory (SPM) for high performance and energy efficiency. However, traditional SPM management techniques cause memory fragmentation for NN accelerators, and thus lead to low utilization of precious SPM. The main reason is that traditional techniques are originally designed for managing fixed-length registers rather than variable-length memory blocks. In this article, we propose a novel SPM management approach for NN accelerators. The basic intuition is that NN computation/memory behaviors are predictable and relatively regular compared with traditional applications, and thus most information can be determined at compile time. In addition, by exploiting the variable-length feature of SPM, we propose to divide the allocation process into two passes: the space assignment and the address assignment pass, which are simultaneously (and implicitly) performed in traditional one-pass allocation techniques. Experimental results on the memory requests of a representative NN accelerator demonstrate that the proposed approach can significantly reduce the memory consumption by 30% at most compared with state-of-the-art SPM management techniques, and the memory usage is only 2% larger than that of the theoretical optimal allocation. K E Y W O R D S deep neural network, memory management, scratchpad memory 1 INTRODUCTION Deep neural networks (DNNs) have been widely used for various applications, such as computer vision, 1 speech recognition, 2 machine translation, 3 and robotics, 4 due to the improved accuracy over traditional machine learning approaches. However, the performance benefits of DNNs come at the cost of extremely high computation and memory complexity, which pose great challenges to underlying hardware architecture. To improve the efficiency of DNN processing, various specialized accelerators have been proposed to deliver orders of magnitude better performance and energy efficiency than general-purpose architectures such as CPUs and GPUs. 5,6 Specialized neural network accelerators typically require various novel architectural components, including control logics (e.g., DianNao 5 employs a control processor with dedicated control instructions and Eyeriss 6 employs two-level control hierarchy 6), computational units (e.g., DianNao employs 16-fix functional units to leverage the error-tolerance features of intelligent applications), and memory hierarchy Zhenxing Zhang and Shiyan Sun contributed equally to this work.