2000
DOI: 10.1145/380995.381030
|View full text |Cite
|
Sign up to set email alerts
|

The UCI KDD archive of large data sets for data mining research and experimentation

Abstract: Advances in data collection and storage have allowed organizations to create massive, complex and heterogeneous databases, which h a ve s t ymied traditional methods of data analysis. This has led to the development of new analytical tools that often combine techniques from a variety of elds such as statistics, computer science, and mathematics to extract meaningful knowledge from the data. To support research in this area, UC Irvine has created the UCI Knowledge Discovery in Databases KDD Archive

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
106
0
3

Year Published

2000
2000
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 203 publications
(116 citation statements)
references
References 5 publications
0
106
0
3
Order By: Relevance
“…We run our experiments on 29 benchmark data sets from UCI machine learning repository (Blake and Merz 1998) and KDD archive (Bay 1999). This experimental suite comprises 3 parts.…”
Section: Datamentioning
confidence: 99%
“…We run our experiments on 29 benchmark data sets from UCI machine learning repository (Blake and Merz 1998) and KDD archive (Bay 1999). This experimental suite comprises 3 parts.…”
Section: Datamentioning
confidence: 99%
“…In order to explain how the techniques introduced in this paper can practically improve the efficiency of rule discovery, we do our experiments by applying the new algorithm to 10 databases chosen from the UCI Machine Learning repository [6] and the UCI KDD archives [3]. The databases are described in table 3.…”
Section: Experimental Evaluationsmentioning
confidence: 99%
“…where D 1 (i, y) is given by (1), and a(θ, i) represents the number of examples which are squashed into the leaf i. (c) Data Squashing Construct a novel SF tree from the training examples.…”
Section: Of the Final Round T And A Classification Model (B) Update mentioning
confidence: 99%
“…We employed the KDD Cup 1999 data set [1], from which we produced several data sets. Since it is difficult to introduce a distance measure of data squashing for a nominal attribute and binary attributes can be misleading in calculating a distance, we deleted such attributes before the experiments.…”
Section: Experimental Conditionmentioning
confidence: 99%