Hubness-Based Clustering of High-Dimensional Data

Tomašev, Nenad; Radovanović, Miloš; Mladenić, Dunja; Ivanović, Mirjana

doi:10.1007/978-3-319-09259-1_11

Cited by 13 publications

(6 citation statements)

References 69 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Additionally, since it was demonstrated on several occasions that a better handling of hub points may result in better overall clustering quality in manydimensional problems [40,83,84], we intend to consider either extending the existing clustering quality indexes or proposing new ones that would incorporate this finding into account.…”

Section: Perspectives and Future Directionsmentioning

confidence: 97%

See 1 more Smart Citation

Clustering Evaluation in High-Dimensional Data

Tomašev

Radovanović

2016

Unsupervised Learning Algorithms

Self Cite

View full text Add to dashboard Cite

Clustering evaluation plays an important role in unsupervised learning systems, as it is often necessary to automatically quantify the quality of generated cluster configurations. This is especially useful for comparing the performance of different clustering algorithms as well as determining the optimal number of clusters in clustering algorithms that do not estimate it internally. Many clustering quality indexes have been proposed over the years and different indexes are used in different contexts. There is no unifying protocol for clustering evaluation, so it is often unclear which quality index to use in which case. In this chapter, we review the existing clustering quality measures and evaluate them in the challenging context of high-dimensional data clustering. High-dimensional data is sparse and distances tend to concentrate, possibly affecting the applicability of various clustering quality indexes. We analyze the stability and discriminative power of a set of standard clustering quality measures with increasing data dimensionality. Our evaluation shows that the curse of dimensionality affects different clustering quality indexes in different ways and that some are to be preferred when determining clustering quality in many dimensions.

show abstract

Section: Perspectives and Future Directionsmentioning

confidence: 97%

“…Hubness-based clustering has recently been proposed for high-dimensional clustering problems [83,84] and has been successfully applied in some domains like document clustering [40].…”

Section: Clustering Techniques For High-dimensional Datamentioning

confidence: 99%

Clustering Evaluation in High-Dimensional Data

Tomašev

Radovanović

2016

Unsupervised Learning Algorithms

Self Cite

View full text Add to dashboard Cite

show abstract

“…Unfortunately, some of the hubs are bad in the sort of sense that they may mislead machine learning algorithms. The presence of hubs have been studied primarily in context of classification, clustering and instance selection, see (Radovanović et al, 2010a), (Tomašev and Mladenić, 2013), (Radovanović et al, 2009), (Radovanović et al, 2010b), (Tomašev et al, 2011), (Tomašev et al, 2015b), , and (Tomašev et al, 2015a) for a survey.…”

Section: Related Workmentioning

confidence: 99%

ParkinsoNET: Estimation of UPDRS Score Using Hubness-Aware Feedforward Neural Networks

Búza

Varga

2016

Applied Artificial Intelligence

View full text Add to dashboard Cite

Parkinson's disease is a worldwide frequent neurodegenerative disorder with increasing incidence. Speech disturbance appears during the progression of the disease. UPDRS is a gold standard tool for diagnostic and follow up of the disease. We aim at estimating the UPDRS score based on biomedical voice recordings. In this paper, we study the hubness phenomenon in context of the UPDRS score estimation and propose hubness-aware error correction for feed-forward neural networks in order to increase the accuracy of estimation. We perform experiments on publicly available datasets derived form real voice data and show that the proposed technique systematically increases the accuracy of various feed-forward neural networks.

show abstract

“…A feature selection technique explores the possibility of dimensional subset for carrying out clustering by eliminating unnecessary and inappropriate dimensions. Subspace clustering is one of such technique that positions its search operation and generates information about the clusters present in multiple subspaces in overlapping conditions [4] [5].…”

Section: Introductionmentioning

confidence: 99%

EDSC: Efficient document subspace clustering technique for high-dimensional data

Radhika

et al. 2016

2016 International Conference on Computational Techniques in Information and Communication Technologies (ICCTICT)

View full text Add to dashboard Cite

Abstract-With the advancement in the pervasive technology, there is a spontaneous rise in the size of the data. Such data are generated from various forms of resources right from individual to organization level. Due to the characteristics of unstructured or semi-structuredness in data representation, the existing data analytics approaches are not directly applicable which leads to curse of dimensionality problem. Hence, this paper presents an Efficient Document Subspace Clustering (EDSC) technique for highdimensional data that contributes to the existing system with respect to identification by eliminating the redundant data. The discrete segmentation of data points a r e u s e d to explicitly expose the dimensionality of hidden subspaces in the clusters. The outcome of the proposed system was compared with existing system to find the effective document clustering process for high-dimensional data. The processing time of EDSC for subspace clustering is reduced by 50% as compared to the existing system.

show abstract

Hubness-Based Clustering of High-Dimensional Data

Cited by 13 publications

References 69 publications

Clustering Evaluation in High-Dimensional Data

Clustering Evaluation in High-Dimensional Data

ParkinsoNET: Estimation of UPDRS Score Using Hubness-Aware Feedforward Neural Networks

EDSC: Efficient document subspace clustering technique for high-dimensional data

Contact Info

Product

Resources

About