Bagging is one of the well-known ensemble learning methods, which combines several classifiers trained on different subsamples of the dataset. However, a drawback of bagging is its random selection, where the classification performance depends on chance to choose a suitable subset of training objects. This paper proposes a novel modified version of bagging, named enhanced Bagging (eBagging), which uses a new mechanism (error-based bootstrapping) when constructing training sets in order to cope with this problem. In the experimental setting, the proposed eBagging technique was tested on 33 well-known benchmark datasets and compared with both bagging, random forest and boosting techniques using well-known classification algorithms: Support Vector Machines (SVM), decision trees (C4.5), k-Nearest Neighbour (kNN) and Naive Bayes (NB). The results show that eBagging outperforms its counterparts by classifying the data points more accurately while reducing the training error
Air pollution, which is the result of the urbanization brought by modern life, has a dramatic impact on the global scale as well as local and regional scales. Since air pollution has important effects on human health and other living things, the issue of air quality is of great importance all over the world. Accordingly, many studies based on classification, clustering and association rule mining applications for air pollution have been proposed in the field of data mining and machine learning to extract hidden knowledge from environmental parameters. One approach is to model a region in a way that cities having similar characteristics are determined and placed into the same clusters. Instead of using traditional clustering algorithms, a novel algorithm, named Majority Voting based Multi-Task Clustering (MV-MTC), is proposed and utilized to consider multiple air pollutants jointly. Experimental studies showed that the proposed method is superior to five well-known clustering algorithms: K-Means, Expectation Maximization, Canopy, Farthest First and Hierarchical clustering methods.
Environmental data mining is the nontrivial process of identifying valid, novel, and potentially useful patterns in data from environmental sciences. This chapter proposes ensemble methods in environmental data mining that combines the outputs from multiple classification models to obtain better results than the outputs that could be obtained by an individual model. The study presented in this chapter focuses on several ensemble strategies in addition to the standard single classifiers such as decision tree, naive Bayes, support vector machine, and k-nearest neighbor (KNN), popularly used in literature. This is the first study that compares four ensemble strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection (the random forest algorithm), (iii) boosting (the AdaBoost algorithm), and (iv) voting of different algorithms. In the experimental studies, ensemble methods are tested on different real-world environmental datasets in various subjects such as air, ecology, rainfall, and soil.
Through the use of internet of things-based sensors in air quality monitoring stations, concentration of different pollutants and meteorological parameters can be regularly measured. In case of unusual conditions (e.g., increased levels of dangerous pollutants), a smart assessment system can produce warning so that appropriate air quality management process can be initiated. In this context, the objective of this study is to discover relationships and patterns among air pollution features and characteristics. In this case, determination of frequently observed association rules can trigger an appropriate background smart environment system when a critical situation is detected. In the experimental studies in the current project, traditional association rule mining and weighted association rule mining methods have been employed using real-world datasets collected from 21 monitoring stations in Turkey. In consequence, useful and outstanding association rules exceeding the user-defined support and confidence levels were obtained that can form basis for further research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.