2017
DOI: 10.5445/ksp/1000058749/22
|View full text |Cite
|
Sign up to set email alerts
|

On a Comprehensive Metadata Framework for Artificial Data in Unsupervised Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…A blueprint of a novel device for simulating data for benchmarking in unsupervised learning has been designed by Dangl and Leisch (2017). This blueprint comprises the plan of a web repository and an accompanying R (R Core Team, 2017) package for the actual production of metadata objects and for the subsequent generation of data sets on a local computer.…”
Section: Issuesmentioning
confidence: 99%
See 1 more Smart Citation
“…A blueprint of a novel device for simulating data for benchmarking in unsupervised learning has been designed by Dangl and Leisch (2017). This blueprint comprises the plan of a web repository and an accompanying R (R Core Team, 2017) package for the actual production of metadata objects and for the subsequent generation of data sets on a local computer.…”
Section: Issuesmentioning
confidence: 99%
“…• In case of simulated data, organize a fair comparison in terms of the relation between the methods under study and the data-generating mechanisms of the simulations, with fair meaning that one should not exclusively rely on mechanisms that unilaterally favor methods which explicitly or implicitly assume that these mechanisms are in place. • Disclose full information on the data sets that are used (making use, whenever meaningful, of platforms such as GitHub or Gitlab) in view of reproducibility (Dangl & Leisch, 2017;Donoho, 2010;Hofner et al, 2016;Peng, 2011) and of enabling follow-up research. This means that: for simulated data sets, provide implementable data-generating code with full information on cluster-specific parameters, the data-generating function, random seeds, the type and version of the random number generator, and so on; for empirical data sets, provide the full data sets, with sufficient detail on format, codes used to denote missing values, pre-processing, and so on.…”
Section: Recommendationsmentioning
confidence: 99%