Abstract:The study tested a data mining engine (PARACUDA ® ) to predict various soil attributes (BC, CEC, BS, pH, C org , Pb, Hg, As, Zn and Cu) using reflectance data acquired for both optical and thermal infrared regions. The engine was designed to utilize large data in parallel and automatic processing to build and process hundreds of diverse models in a unified manner while avoiding bias and deviations caused by the operator(s). The system is able to systematically assess the effect of diverse preprocessing techniques; additionally, it analyses other parameters, such as different spectral resolutions and spectral coverages that affect soil properties. Accordingly, the system was used to extract models across both optical and thermal infrared spectral regions, which holds significant chromophores. In total, 2880 models were evaluated where each model was generated with a different preprocessing scheme of the input spectral data. The models were assessed using statistical parameters such as coefficient of determination (R 2 ), square error of prediction (SEP), relative percentage difference (RPD) and by physical explanation (spectral assignments). It was found that the smoothing procedure is the most beneficial preprocessing stage, especially when combined with spectral derivation (1st or 2nd derivatives). Automatically and without the need of an operator, the data mining engine enabled the best prediction models to be found from all the combinations tested. Furthermore, the data mining approach used in this study and its processing scheme proved to be efficient tools for getting a better understanding of the geochemical properties of the samples studied (e.g., mineral associations).