Outlier detection is one of the major data mining methods. This paper proposes a three-step approach to detect spatio-temporal outliers in large databases. These steps are clustering, checking spatial neighbors, and checking temporal neighbors. In this paper, we introduce a new outlier detection algorithm to find small groups of data objects that are exceptional when compared with the remaining large amount of data. In contrast to the existing outlier detection algorithms, the new algorithm has the ability of discovering outliers according to the non-spatial, spatial and temporal values of the objects. In order to demonstrate the new algorithm, this paper also presents an example of application using a data warehouse.
Data mining has been proven useful for knowledge discovery in many areas, ranging from marketing to medical and from banking to education. This study focuses on data mining and machine learning in textile industry as applying them to textile data is considered an emerging interdisciplinary research field. Thus, data mining studies, including classification and clustering techniques and machine learning algorithms, implemented in textile industry were presented and explained in detail in this study to provide an overview of how clustering and classification techniques can be applied in the textile industry to deal with different problems where traditional methods are not useful. This article clearly shows that a classification technique has higher interest than a clustering technique in the textile industry. It also shows that the most commonly applied classification methods are artificial neural networks and support vector machines, and they generally provide high accuracy rates in the textile applications. For the clustering task of data mining, a K-means algorithm was generally implemented in textile studies among the others that were investigated in this article. We conclude with some remarks on the strength of the data mining techniques for textile industry, ways to overcome certain challenges, and offer some possible further research directions.
Bagging is one of the well-known ensemble learning methods, which combines several classifiers trained on different subsamples of the dataset. However, a drawback of bagging is its random selection, where the classification performance depends on chance to choose a suitable subset of training objects. This paper proposes a novel modified version of bagging, named enhanced Bagging (eBagging), which uses a new mechanism (error-based bootstrapping) when constructing training sets in order to cope with this problem. In the experimental setting, the proposed eBagging technique was tested on 33 well-known benchmark datasets and compared with both bagging, random forest and boosting techniques using well-known classification algorithms: Support Vector Machines (SVM), decision trees (C4.5), k-Nearest Neighbour (kNN) and Naive Bayes (NB). The results show that eBagging outperforms its counterparts by classifying the data points more accurately while reducing the training error
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.