“…Since then, multiple off-the-shelf CNN backbones have been widely applied to KWS tasks, such as deep residual network (ResNet) [2], separable CNN [3,4,5,6], temporal CNN [7] and SincNet [8]. There are also other efforts to boost performance of CNN models for KWS by combining other deep learning models, such as recurrent neural network (RNN) [9], bidirectional long short-term memory (BiLSTM) [10] and streaming layers [11]. However, although the off-the-shelf CNN backbones that existing KWS studies usually relied on have been demonstrated to be effective in image classification tasks, they are not specifically designed for KWS tasks and might not be the perfect architecture for KWS tasks.…”