Element-centric clustering comparison unifies overlaps and hierarchy

Gates, Alexander J.; Wood, Ian B.; Hetrick, William P.; Ahn, Yong‐Yeol

doi:10.1038/s41598-019-44892-y

Cited by 89 publications

(82 citation statements)

References 61 publications

(71 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Disjointedness and cohesion strength take the community identity of the other nodes, but disjointedness focuses on how a node independently changes its community identity apart from the other nodes and cohesion strength only counts the mutual companionship without taking the absence of companionship into count. The Rand index [16] measures the similarity in data clusterings but it is a cluster-centric measure [17], while CoI is node centric. The aforementioned measures also utilize the fuzziness of community [18] as CoI does.…”

Section: Nodesmentioning

confidence: 99%

Relational flexibility of network elements based on inconsistent community detection

Kim

Lee

2019

Phys. Rev. E

View full text Add to dashboard Cite

Community identification of network components enables us to understand the mesoscale clustering structure of networks. A number of algorithms have been developed to determine the most likely community structures in networks. Such a probabilistic or stochastic nature of this problem can naturally involve the ambiguity in resultant community structures. More specifically, stochastic algorithms can result in different community structures for each realization in principle. In this study, instead of trying to "solve" this community degeneracy problem, we turn the tables by taking the degeneracy as a chance to quantify how strong companionship each node has with other nodes. For that purpose, we define the concept of companionship inconsistency that indicates how inconsistently a node is identified as a member of a community regarding the other nodes. Analyzing model and real networks, we show that companionship inconsistency discloses unique characteristics of nodes, thus we suggest it as a new type of node centrality. In social networks, for example, companionship inconsistency can classify outsider nodes without firm community membership and promiscuous nodes with multiple connections to several communities. In infrastructure networks such as power grids, it can diagnose how the connection structure is evenly balanced in terms of power transmission. Companionship inconsistency, therefore, abstracts individual nodes' intrinsic property on its relationship to a higher-order organization of the network.

show abstract

Section: Nodesmentioning

confidence: 99%

Relational flexibility of network elements based on inconsistent community detection

Kim

Lee

2019

Phys. Rev. E

View full text Add to dashboard Cite

show abstract

“…Such analysis may be seen as a time‐varying adaptation of the concept of ‘fustrated clusterings’ proposed by Gates et al . (2019), which refers to observations that the method ‘cannot consistently decide on a grouping’ (p. 8). Identify the distinctive features of non‐persistent banks (relative to persistent ones) by comparing banks that changed their business model in a given triennium ( t +1) with other banks that held the same business model in the triennium prior to the change ( t ) and did not change their business model in t +1, with respect to the features exhibited by both banks in triennium t . To undergo this analysis, we run Bayesian logistic regressions ( J regressions; i.e.…”

Section: Methodsmentioning

confidence: 99%

“…As expected, as the disturbances become larger, the similarity of classifications with the baseline sample reduce for all measures (e.g. Gates et al ., 2019). This result suggests that, although the approach handles small disturbances well, practitioners should strive to use a stable sample in a scenario where business model analysis is performed in a time‐varying setting (e.g.…”

Section: Robustness Checksmentioning

confidence: 99%

Using clustering ensemble to identify banking business models

Marques

Alves

2020

Intell Sys Acc Fin Mgmt

View full text Add to dashboard Cite

Summary The business models of banks are often seen as the result of a variety of simultaneously determined managerial choices, such as those regarding the types of activities, funding sources, level of diversification, and size. Moreover, owing to the fuzziness of data and the possibility that some banks may combine features of different business models, the use of hard clustering methods has often led to poorly identified business models. In this paper we propose a framework to deal with these challenges based on an ensemble of three unsupervised clustering methods to identify banking business models: fuzzy c‐means (which allows us to handle fuzzy clustering), self‐organizing maps (which yield intuitive visual representations of the clusters), and partitioning around medoids (which circumvents the presence of data outliers). We set up our analysis in the context of the European banking sector, which has seen its regulators increasingly focused on examining the business models of supervised entities in the aftermath of the twin financial crises. In our empirical application, we find evidence of four distinct banking business models and further distinguish between banks with a clearly defined business model (core banks) and others (non‐core banks), as well as banks with a stable business model over time (persistent banks) and others (non‐persistent banks). Our proposed framework performs well under several robustness checks related with the sample, clustering methods, and variables used.

show abstract

“…Essentially, these measures assess clustering methods from different viewpoints, and in practice, there is no clustering method that could possibly reach the best performance in all of these performance metrics for a given problem domain [24]. A number of studies revolve around developing performance measures for clustering methods with the aim of determining the appropriateness of the produced clusters [25], [26]. However, surprisingly, although there is an increasing consensus concerning the importance of properly identifying the best clustering method and subsequently interpreting the produced result for a given problem, a limited number of research studies [26]- [28], if any, have comprehensively considered both internal and external measurements for the evaluation of clustering methods in an educational context.…”

Section: Introductionmentioning

confidence: 99%

“…A number of studies revolve around developing performance measures for clustering methods with the aim of determining the appropriateness of the produced clusters [25], [26]. However, surprisingly, although there is an increasing consensus concerning the importance of properly identifying the best clustering method and subsequently interpreting the produced result for a given problem, a limited number of research studies [26]- [28], if any, have comprehensively considered both internal and external measurements for the evaluation of clustering methods in an educational context. In addition to the tedious process behind the experimentation and the data preprocessing, one main reason is that cluster evaluation normally involves multiple conflicting criteria (due to a large number of external and internal metrics).…”

Section: Introductionmentioning

confidence: 99%

Clustering Algorithms in an Educational Context: An Automatic Comparative Approach

et al. 2020

View full text Add to dashboard Cite

Despite an increasing consensus regarding the significance of properly identifying the most suitable clustering method for a given problem, a surprising amount of educational research, including both educational data mining (EDM) and learning analytics (LA), neglects this critical task. This shortcoming could in many cases have a negative impact on the prediction power of both the EDM and LA based approaches. To address such issues, this work proposes an evaluation approach that automatically compares several clustering methods using multiple internal and external performance measures on 9 real-world educational datasets of different sizes, created from the University of Tartu's Moodle system, to produce twoway clustering. Moreover, to investigate the possible effect of normalization on the performance of the clustering algorithms, this work performs the same experiment on a normalized version of the datasets. Since such an exhaustive evaluation includes multiple criteria, the proposed approach employs a multiple criteria decision-making method (i.e., TOPSIS) to rank the most suitable methods for each dataset. Our results reveal that the proposed approach can automatically compare the performance of the clustering methods and accordingly recommend the most suitable method for each dataset. Furthermore, our results show that in both normalized and nonnormalized datasets of different sizes with 10 features, DBSCAN and k-medoids are the best clustering methods, whereas agglomerative and spectral methods appear to be among the most stable and highly performing clustering methods for such datasets with 15 features. Regarding datasets with more than 15 features, OPTICS is among the top-ranked algorithms among the nonnormalized datasets, and kmedoids is the best among the normalized datasets. Interestingly, our findings reveal that normalization may have a negative effect on the performance of certain methods, e.g., spectral clustering and OPTICS; however, it appears to mostly have a positive impact on all of the other clustering methods.

show abstract

Element-centric clustering comparison unifies overlaps and hierarchy

Cited by 89 publications

References 61 publications

Relational flexibility of network elements based on inconsistent community detection

Relational flexibility of network elements based on inconsistent community detection

Using clustering ensemble to identify banking business models

Clustering Algorithms in an Educational Context: An Automatic Comparative Approach

Contact Info

Product

Resources

About