Summary
Devices in the Internet of Things (IoT) generate and gather a vast amount of data. Smart devices with constrained resources cannot execute large data‐based machine learning (ML) algorithms. Therefore, in this paper, we propose, analyze, and evaluate data reduction at the Fog level. As a matter of fact, the focus of this paper is twofold: The first objective is to process the reduced datasets by ML, and the second aim is to segregate the irrelevant data and preserve the quality of the ML models. The naïve Bayesian classifier is used for model analysis. For data (attributes) reduction, the state‐of‐the‐art approaches of CF Subset Evaluation, Info Gain Evaluation, Gain Ratio Attribute Evaluation, and the principal component analysis (PCA) approaches are employed. From the implementation point of view, the naïve Bayesian classifier is used to learn the class distribution and the correlation of the classes with the rest of the features. The naïve Bayesian model is generated using the full dataset without feature reduction. Feature selection algorithms then reduce features by 50%. This reduced set is also used to generate the naïve Bayesian model, which gives us a benchmark for comparing the results with the original unreduced dataset. From the results, we find that the performance of the reduced set is either increased or is almost similar to the full set of features. With this comprehensive experimental evaluation, we believe that data reduction can provide a blueprint for avoiding unnecessary data storage and processing. The performance of the ML models is not drastically affected by feature reduction and hence demonstrates the efficacy of the reduced ML models.