Recently, sound-based diagnosis systems have been given much attention in many fields due to the advantages of their simple structure, non-touching measurement style, and low-power dissipation. In order to improve the efficiency of coal production and the safety of the coal mining process, accurate information is always essential. It is indicated that the sound signal produced during the cutting process of the coal mining shearer contains much cutting pattern identification information. In this paper, the original acoustic signal is first collected through an industrial microphone. To analyze the signal deeply, an adaptive Hilbert-Huang transform (HHT) was applied to decompose the sound to several intrinsic mode functions (IMFs) to subsequently acquire 1024 Hilbert marginal spectrum points. The 1024 time-frequency nodes were reorganized as a 32 × 32 feature map. Moreover, the LeNet-5 convolutional neural network (CNN), with three convolution layers and two sub-sampling layers, was used as the cutting pattern recognizer. A simulation example, with 10,000 training samples and 2000 testing samples, was conducted to prove the effectiveness of the proposed method. Finally, 1971 testing sound series were recognized accurately through the trained CNN and the proposed method achieved an identification rate of 98.55%. Therefore, it is widely applied in fault diagnosis [7,8], target detection [9], feature extraction [10], and so on.Unfortunately, the original cutting signal acquired from the coal mining field is always nonlinear, nonstationary, and discontinuous. It is an exceedingly difficult problem to extract key information from the signal. Thus, a powerful signal process method is one of the keys to settling the tough matter. However, typical sound signal analysis approaches such as short-time Fourier transform (STFT), wavelet transform (WT), and wavelet packet transform (WPT) are inappropriate to treat the cutting sound. Due to the characteristics of strong nonlinearity and nonstationarity, STFT is unable to play an effective role on the signal due to the Dirichlet condition and Heisenberg uncertainty principle [11,12]. Similarly, the WT and WPT do not work on interval cutting sound signal due to the fixed wavelet basis [13]. In 1998, an adaptive decomposing method, named the Hilbert-Huang transform (HHT), was established by National Aeronautics and Space Administration (NASA) [14]. The HHT is composed of empirical mode decomposition (EMD), proposed by Huang and Wu in 2008, and the Hilbert transform (HT) [15]. As the basis is adaptive, the HHT is not affected by the restrictions of previous approaches and becomes an attractive tool to find faults in diagnosis [16], speech recognition [17], signal denoising [18], pattern recognition [19], forecasting [20], and so on.After decomposing the original sound into a series of time-frequency characteristics, many researchers adopted some feature extractors to reduce the dimension and eliminate redundant information. The effectiveness of feature extracting is a key factor in t...