memory [15][16][17] and spin-transfer torque (STT)-magnetic random access memory (MRAM), [18] to emulate the synapse and execute the vector-matrix multiplication efficiently.Among the possible candidates in the aforementioned computing architecture, in addition to the resistive random access memory (RRAM), phase-change random access memory (PCRAM), and MRAM, the nanoscale flash memory [19,20] has shown great prospects for the hardware implementation of deep learning due to its commercialized technology, ultrahigh integration density, and high-speed transmission. [21,22] Updated researches [23][24][25][26][27] show that the nanoscale flash memory array could be used to improve the computing efficiency of vector-by-matrix multiplication and a fully connected neural network was demonstrated. [25] Whereas, the hardware realization of fully connected layers is far from enough in multilayer neural network of deep learning due to the fact that over 90% of the computation is in the form of convolution. [28] In general, the overall technically demonstration of the nanoscale flash memory based hardware implementation of deep learning neural network that exploits the array configuration is still lacking.Here, we propose and demonstrate a new computing paradigm with hardware implementation of convolution, pooling, and fully connected layers of DNN based on the nanoscale flash computing array (NFCA), which is a universal and reconfigurable scheme. Multiple NFCAs combined with independent data processing blocks make it scalable to construct hardware DNN flexibly. We also show a low-cost, facile programming methodology to achieve precisely tuning of flash cells with small variability. The parallel computing of the preprogrammed NFCA leads to significant speed and energy efficiency increase. Furthermore, a five-layer DNN is simulated in simulation program with integrated circuit emphasis (SPICE) using the measured data from the fabricated 65 nm nor-type (NOR) flash memory to recognize Modified National Institute of Standards and Technology (MNIST) handwritten digit database and 97.8% recognition accuracy is achieved. Moreover, the optimized design of the DNN structure is proposed comprehensively to decrease the energy consumption and hardware cost. Deep learning neural network (DNN) can provide efficient approaches to process the increasing unstructured data, such as images, audio, and video. To improve the computing power and the energy efficiency of data processing in DNN, a universal and reconfigurable computing paradigm with the hardware implementation scheme including the convolution, pooling, and fully connected layers is developed based on nanoscale flash computing arrays, which can be massively fabricated. Via precisely tuning the threshold voltage, the fabricated 65 nm nanoscale flash cells can exhibit 16 levels (four bits) of storage states. To confirm the feasibility of the computing paradigm, an exemplary five-layer DNN is simulated based on the measured data from the nor-type (NOR) flash memory and exhibits 97.8% re...