2014
DOI: 10.1007/978-3-319-12640-1_37
|View full text |Cite
|
Sign up to set email alerts
|

Synthetic Test Data Generation for Hierarchical Graph Clustering Methods

Abstract: Abstract. Recent achievements in graph-based clustering algorithms revealed the need for large-scale test data sets. This paper introduces a procedure that can provide synthetic but realistic test data to the hierarchical Markov clustering algorithm. Being created according to the structure and properties of the SCOP95 protein sequence data set, the synthetic data act as a collection of proteins organized in a four-level hierarchy and a similarity matrix containing pairwise similarity values of the proteins. A… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
1
1

Relationship

1
1

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 9 publications
0
2
0
Order By: Relevance
“…The proposed method underwent a series of benchmark tests using synthetic test data sets of sizes ranging from 10 thousand to one million items. Data sets were generated using the method indicated in [15]. For each data size, 21 instances were created and the one with median density was chosen for the test.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed method underwent a series of benchmark tests using synthetic test data sets of sizes ranging from 10 thousand to one million items. Data sets were generated using the method indicated in [15]. For each data size, 21 instances were created and the one with median density was chosen for the test.…”
Section: Resultsmentioning
confidence: 99%
“…This change can significantly upgrade the size of processable data sets, and may also improve processing speed. The proposed method will be validated using large synthetic protein data sets derived from the SCOP95 database [2,10,14] using our method presented in [15].…”
Section: Introductionmentioning
confidence: 99%