“…Specifically, (i) FCSs show on average t IDW = 0.5 ms and t 2CD = 3.3 ms whereas CCSs are represented on average by t IDW = 1.0 ms and t 2CD = 5.1 ms according to Hoevers et al [11]; (ii) FCSs show on average t IDW = 0.9 ms and t 2CD = 6.0 ms whereas CCSs are represented on average by t IDW = 1.25 ms and t 2CD = 9.50 ms according to Cohen et al [12]; (iii) FCSs show on average t IDW = 0.7 ms and t 2CD = 5 ms whereas CCSs are represented on average by t IDW = 1.5 ms and t 2CD = 10 ms according to the American Thoracic Society (ATS) [13]. For many years, signal processing and machine learning approaches have been combined for event detection and classification tasks using spectro-temporal features [14][15][16]. Specifically for the task of crackle sound detection, several approaches have been proposed based on spectrogram analysis [17,18], autoregressive (AR) models [19,20], wavelet transform [21][22][23][24], fractal dimension filtering [25][26][27][28], entropy [29,30], empirical mode decomposition (EMD) [31], fuzzy systems [32], Gaussian mixture models (GMM) [33], logistic regression [34], support vector machines (SVM) [35][36][37], independent component analysis (ICA) [38], multi-perceptron networks (MPNs) [39], non-negative matrix factorization (NMF) [40], convolutional neural networks (CNNs) [41,42], recurrent neural networks (RNNs) [43,44] and hybrid neural networks [45,46].…”