Katharina Tschumitschew scite author profile

2010

Evolving Systems

Quantiles play an important role in data analysis. On-line estimation of quantiles for streaming datai.e.data arriving step by step over time-especially with devices with limited memory and computation capacity like electronic control units is not as simple as incremental or recursive estimation of characteristics like the mean (expected value) or the variance. In this paper, we propose an algorithm for incremental quantile estimation that overcomes restrictions of previously described techniques. We also develop a statistical test for our algorithm to detect changes, so that the on-line estimation of the quantiles can be carried out in an adaptive or evolving manner. Besides a statistical analysis of our algorithm, we also provide experimental results comparing our algorithm with a recursive quantile estimation technique which is restricted to continuous random variables.

Effects of drift and noise on the optimal sliding window size for data stream regression models

Tschumitschew¹,

Communications in Statistics - Theory and Methods

2016

CitationEffects of drift and noise on the optimal sliding window size for data stream regression models 2016, 46 (10) The analysis of non-stationary data streams requires a continuous adaption of the model to the relevant most recent data. This requires that changes in the data stream must be distinguished from noise. Many approaches are based on heuristic adaptation schemes. We analyse simple regression models to understand the joint effects of noise and concept drift and derive the optimal sliding window size for the regression models. Our theoretical analysis and simulations show that a near optimal window size can be crucial. Our models can be used as benchmarks for other models to see how they cope with noise and drift.

Cluster Validity Measures Based on the Minimum Description Length Principle

Georgieva

2011

Abstract. Determining the number of clusters is a crucial problem in cluster analysis. Cluster validity measures are one way to try to find the optimum number of clusters, especially for prototype-based clustering. However, no validity measure turns out to work well in all cases. In this paper, we propose an approach to determine the number of cluster based on the minimum description length principle which does not need high computational costs and is also applicable in the context of fuzzy clustering.

A Neuro-Fuzzy Model for Dimensionality Reduction and Its Application

Kolodyazhniy

Int. J. Unc. Fuzz. Knowl. Based Syst.

2007

A novel neuro-fuzzy approach to nonlinear dimensionality reduction is proposed. The approach is an auto-associative modification of the Neuro-Fuzzy Kolmogorov's Network (NFKN) with a "bottleneck" hidden layer. Two training algorithms are considered. The validity of theoretical results and the advantages of the proposed model are confirmed by an experiment in nonlinear principal component analysis and an application in the visualization of high-dimensional wastewater treatment plant data.

Measuring and Visualising Similarity of Customer Satisfaction Profiles for Different Customer Segments

Nauck

2009