Abstract. In this paper, we tackle a novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C+ and C− and a query object o, we want to find top-k subspaces S that maximize the ratio of likelihood of o in C+ against that in C−. We demonstrate that this problem has important applications, and at the same time, is very challenging. It even does not allow polynomial time approximation. We present CSMiner, a mining method with various pruning techniques. CSMiner is substantially faster than the baseline method. Our experimental results on real data sets verify the effectiveness and efficiency of our method.
We tackle the novel problem of mining contrast subspaces. Given a set of multidimensional objects in two classes C + and C − and a query object o, we want to find the top-k subspaces that maximize the ratio of likelihood of o in C + against that in C − . Such subspaces are very useful for characterizing an object and explaining how it differs between two classes. We demonstrate that this problem has important applications, and, at the same time, is very challenging, being MAX SNP-hard. We present CSMiner, a mining method that uses kernel density estimation in conjunction with various pruning techniques.
B Lei DuanWe experimentally investigate the performance of CSMiner on a range of data sets, evaluating its efficiency, effectiveness, and stability and demonstrating it is substantially faster than a baseline method.
A finite family of subsets of a finite set is said to be evolutionary if its members can be ordered so that each subset except the first has an element in the union of the previous subsets and also an element not in that union. The study of evolutionary families is motivated by a conjecture of Naddef and Pulleyblank concerning ear decompositions of 1-extendable graphs. The present paper gives some sufficient conditions for a family to be evolutionary.
Relationship management is critical in business. Particularly, it is important to detect abnormal relationships, such as fraudulent relationships between service providers and consumers. Surprisingly, in the literature there is no systematic study on detecting relationship outliers. Particularly, no existing methods can detect and handle relationship outliers between groups and individuals in groups. In this thesis, we tackle this important problem by developing a simple yet effective model. We identify two types of outliers and devise efficient detection algorithms. Our experiments on both real data sets and synthetic ones confirm the effectiveness and efficiency of our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.