Near-optimal supervised feature selection among frequent subgraphs

Thoma, Marisa; Cheng, Hong; Gretton, Arthur; Han, Jiawei; Kriegel, Hans‐Peter; Smola, Alexander J.; Song, Le; Yu, Philip S.; Yan, Xifeng; Borgwardt, Karsten

doi:10.1137/1.9781611972795.92

Cited by 98 publications

(97 citation statements)

References 36 publications

(46 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We run the discriminatory pattern sampling algorithm with a minimum support value of 6 (20%). The leap search [34], and CORK [23] algorithms did not run on this dataset. Both failed with a segmentation fault.…”

Section: Sampling Results On Large Graphsmentioning

confidence: 99%

“…Though majority of these algorithms consider the summarization of itemset patterns, [34,23] consider graph patterns. In another recent work in the graph domain, Hasan et.…”

Section: Related Workmentioning

confidence: 99%

“…[34] proposed an algorithm, which employs structural proximity and frequency descending mining to reduce the searchable portion of the candidate subgraph space to obtain a small set of subgraph patterns with higher discriminatory score. Very recently, a greedy subgraph feature selection algorithm, named CORK [23] was proposed. It is embedded in the gSpan [35] mining process and it can provide an approximation guaranty with respect to a sub-modular quality criteria.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Output space sampling for graph patterns

2009

View full text Add to dashboard Cite

Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on Metropolis-Hastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.

show abstract

Section: Sampling Results On Large Graphsmentioning

confidence: 99%

“…Though majority of these algorithms consider the summarization of itemset patterns, [34,23] consider graph patterns. In another recent work in the graph domain, Hasan et.…”

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Output space sampling for graph patterns

2009

View full text Add to dashboard Cite

show abstract

“…For instance, CBA [1] first computes all frequent itemsets (with their most frequent class label) and then induces an ordered rule-list classifier by removing redundant itemsets. Several alternative techniques (for instance, [28,30]) define measures of redundancy and ways to select only a limited number of patterns. Constructing a concise pattern set for use in classification can be seen as a form of feature selection.…”

Section: Global Heuristic Two Step Techniquesmentioning

confidence: 99%

k-Pattern Set Mining under Constraints

Guns

Nijssen

Raedt

2013

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

We introduce the k-pattern set mining problem, which is concerned with finding sets of k patterns that satisfy constraints. We formulate a number of such constraints, both at the local level, that is, on individual patterns, and more importantly, also on the global level, that is, on the overall pattern set. The resulting framework is flexible and generic in the sense that it can be instantiated to a wide variety of well-known mining tasks including concept-learning, rule-learning, redescription mining, conceptual clustering and tiling. We present a solution method based on constraint programming and discuss how many problems can been modelled in a constraint programming system. Finally, a number of experiments show the promise and generality of the approach.

show abstract

“…However, their approach is prone to model overfitting as it does not provide any regularization capability. Very recently, a greedy subgraph feature selection algorithm, named CORK [64] was proposed. It is embedded in the gSpan [96] mining process and it can provide an approximation guaranty with respect to a sub-modular quality criteria.…”

Section: Mining Discriminative Patternsmentioning

confidence: 99%

Mining interesting subgraphs by output space sampling

Hasan

2010

SIGKDD Explor. Newsl.

View full text Add to dashboard Cite

Near-optimal supervised feature selection among frequent subgraphs

Cited by 98 publications

References 36 publications

Output space sampling for graph patterns

Output space sampling for graph patterns

k-Pattern Set Mining under Constraints

Mining interesting subgraphs by output space sampling

Contact Info

Product

Resources

About