Cross-entropy clustering

Tabor, Jacek; Spurek, Przemysław

doi:10.1016/j.patcog.2014.03.006

Cited by 71 publications

(54 citation statements)

References 32 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is due to the so-called uniformity effect that causes these algorithms to generate clusters of similar sizes. This is especially vivid in case of centroidbased approaches [60], while density-based ones seem to display some robustness to it [52]. Clustering imbalanced data can be seen from various perspectives: as a process of group discovery on its own, as a method for reducing the complexity of given problem, or as a solution to analysis of the minority class structure.…”

Section: Semi-supervised and Unsupervised Learning From Imbalanced Datamentioning

confidence: 99%

Learning from imbalanced data: open challenges and future directions

2016

View full text Add to dashboard Cite

Despite more than two decades of continuous development learning from imbalanced data is still a focus of intense research. Starting as a problem of skewed distributions of binary tasks, this topic evolved way beyond this conception. With the expansion of machine learning and data mining, combined with the arrival of big data era, we have gained a deeper insight into the nature of imbalanced learning, while at the same time facing new emerging challenges. Data-level and algorithm-level methods are constantly being improved and hybrid approaches gain increasing popularity. Recent trends focus on analyzing not only the disproportion between classes, but also other difficulties embedded in the nature of data. New real-life problems motivate researchers to focus on computationally efficient, adaptive and real-time methods. This paper aims at discussing open issues and challenges that need to be addressed to further develop the field of imbalanced learning. Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision. This paper provides a discussion and suggestions concerning lines of future research for each of them.

show abstract

Section: Semi-supervised and Unsupervised Learning From Imbalanced Datamentioning

confidence: 99%

Learning from imbalanced data: open challenges and future directions

2016

View full text Add to dashboard Cite

show abstract

“…First, we introduce the cost function which will be optimized by the algorithm. Our approach is based on the CEC [18]. Therefore, we start with a short introduction to the method.…”

Section: Theoretical Background Of Ucecmentioning

confidence: 99%

“…More precisely, we use uniform pdf for independent variables, which is a product of univariate marginal pdfs, and the distribution will have generally the rectangle support. Furthermore, simpler optimization procedure known as Cross Entropy Clustering (CEC) [18] is used instead of EM.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Uniform Cross-entropy Clustering

Brzeski¹,

Spurek²

2017

View full text Add to dashboard Cite

Abstract. Robust mixture models approaches, which use non-normal distributions have recently been upgraded to accommodate data with fixed bounds. In this article we propose a new method based on uniform distributions and CrossEntropy Clustering (CEC). We combine a simple density model with a clustering method which allows to treat groups separately and estimate parameters in each cluster individually. Consequently, we introduce an effective clustering algorithm which deals with non-normal data.

show abstract

“…Example 2 (Gaussian distribution [28]) For the multivariate Gaussian distribution, the entropy goes as the log determinant of the covariance; specifically, the differential entropy of a N -dimensional random variable with the density function…”

Section: Remarkmentioning

confidence: 99%

Ellipticity and Circularity Measuring via Kullback–Leibler Divergence

Misztal

Tabor

2015

J Math Imaging Vis

View full text Add to dashboard Cite

Using the Kullback-Leibler divergence we provide a simple statistical measure which uses only the covariance matrix of a given set to verify whether the set is an ellipsoid. Similar measure is provided for verification of circles and balls. The new measure is easily computable, intuitive, and can be applied to higher dimensional data. Experiments have been performed to illustrate that the new measure behaves in natural way.

show abstract

Cross-entropy clustering

Cited by 71 publications

References 32 publications

Learning from imbalanced data: open challenges and future directions

Learning from imbalanced data: open challenges and future directions

Uniform Cross-entropy Clustering

Ellipticity and Circularity Measuring via Kullback–Leibler Divergence

Contact Info

Product

Resources

About