Different from the deep neural network (DNN) inference process, the training process produces huge amount of intermediate data to compute the new weights of the network. Generally, on-chip global buffer (e.g. SRAM cache) has limited capacity because of its low memory density, therefore the off-chip DRAM access is inevitable during the training sequences. In this work, a novel ferroelectric field effect transistor (FeFET) based 3D NAND architecture for on-chip training accelerator is proposed. The reduced peripheral circuit overheads owing to the low operation voltage of FeFET device and ultra-high density of 3D NAND architecture enables storing and computing all the intermediate data on chip during the training process. We present a custom design of 108 Gb chip with 59.91 mm 2 area with 45 % array efficiency. The relevant data mapping schemes for weights/activations/errors that are compatible to the 3D NAND architecture are investigated. The training performance was explored while ResNet-18 model is trained on this architecture with ImageNet dataset by 8-bit precision. Thanks to the minimized off-chip memory access, 7.76 TOPS/W of energy efficiency was achieved for 8-bit on-chip training.