Quantiles play an important role in data analysis. On-line estimation of quantiles for streaming datai.e.data arriving step by step over time-especially with devices with limited memory and computation capacity like electronic control units is not as simple as incremental or recursive estimation of characteristics like the mean (expected value) or the variance. In this paper, we propose an algorithm for incremental quantile estimation that overcomes restrictions of previously described techniques. We also develop a statistical test for our algorithm to detect changes, so that the on-line estimation of the quantiles can be carried out in an adaptive or evolving manner. Besides a statistical analysis of our algorithm, we also provide experimental results comparing our algorithm with a recursive quantile estimation technique which is restricted to continuous random variables.
CitationEffects of drift and noise on the optimal sliding window size for data stream regression models 2016, 46 (10) The analysis of non-stationary data streams requires a continuous adaption of the model to the relevant most recent data. This requires that changes in the data stream must be distinguished from noise. Many approaches are based on heuristic adaptation schemes. We analyse simple regression models to understand the joint effects of noise and concept drift and derive the optimal sliding window size for the regression models. Our theoretical analysis and simulations show that a near optimal window size can be crucial. Our models can be used as benchmarks for other models to see how they cope with noise and drift.
Abstract. Determining the number of clusters is a crucial problem in cluster analysis. Cluster validity measures are one way to try to find the optimum number of clusters, especially for prototype-based clustering. However, no validity measure turns out to work well in all cases. In this paper, we propose an approach to determine the number of cluster based on the minimum description length principle which does not need high computational costs and is also applicable in the context of fuzzy clustering.
A novel neuro-fuzzy approach to nonlinear dimensionality reduction is proposed. The approach is an auto-associative modification of the Neuro-Fuzzy Kolmogorov's Network (NFKN) with a "bottleneck" hidden layer. Two training algorithms are considered. The validity of theoretical results and the advantages of the proposed model are confirmed by an experiment in nonlinear principal component analysis and an application in the visualization of high-dimensional wastewater treatment plant data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.