Formal concept analysis (FCA) has been applied successively in diverse fields such as data mining, conceptual modeling, social networks, software engineering, and the semantic web. One shortcoming of FCA, however, is the large number of concepts that typically arise in dense datasets hindering typical tasks such as rule generation and visualization. To overcome this shortcoming, it is important to develop formalisms and methods to segment, categorize and cluster formal concepts. The first step in achieving these aims is to define suitable similarity and dissimilarity measures of formal concepts. In this paper we propose three similarity measures based on existent set-based measures in addition to developing the completely novel zeros-induced measure. Moreover, we formally prove that all the measures proposed are indeed similarity measures and investigate the computational complexity of computing them. Finally, an extensive empirical evaluation on real-world data is presented in which the utility and character of each similarity measure is tested and evaluated.
Conventional clustering algorithms group similar data points together along one dimension of a data table. Bi-clustering simultaneously clusters both dimensions of a data table. 3-clustering goes one step further and aims to concurrently cluster two data tables that share a common set of row labels, but whose column labels are distinct. Such clusters reveal the underlying connections between the elements of all three sets. We present a novel algorithm that discovers 3-clusters across vertically partitioned data. Our approach presents two new and important formulations: first we introduce the notion of a 3-cluster in partitioned data; and second we present a mathematical formulation that measures the quality of such clusters. Our algorithm discovers high quality, arbitrarily positioned, overlapping clusters, and is efficient in time. These results are exhibited in a comprehensive study on real datasets.
A fundamental task of data analysis is comprehending what distinguishes clusters found within the data. We present the problem of mining distinguishing sets which seeks to find sets of objects or attributes that induce that most change among the incremental bi-clusters of a binary dataset. Unlike emerging patterns and contrast sets which only focus on statistical differences between support of itemsets, our approach considers distinctions in both the attribute space and the object space. Viewing the lattice of bi-clusters formed within a data set as a weighted directed graph, we mine the most significant distinguishing sets by growing a maximal cost spanning tree of the lattice. In this paper we present a weighting function for measuring distinction among bi-clusters in the lattice and the novel MIDS algorithm. MIDS simultaneously enumerates biclusters, constructs the bi-cluster lattice, and computes the distinguishing sets. The efficient computational performance of MIDS is exhibited in a performance test on real world and benchmark data sets. The utility of distinguishing sets is also demonstrated with experiments on synthetic and real data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.