2004
DOI: 10.1007/978-3-540-24741-8_9
|View full text |Cite
|
Sign up to set email alerts
|

LIMBO: Scalable Clustering of Categorical Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
169
0
1

Year Published

2011
2011
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 197 publications
(180 citation statements)
references
References 12 publications
2
169
0
1
Order By: Relevance
“…A text document can be represented either in the form of binary data, when we use the presence or absence of a word in the document in order to create a binary vector. In such cases, it is possible to directly use a variety of categorical data clustering algorithms [10,41,43] on the binary representation. A more enhanced representation would include refined weighting methods based on the frequencies of the individual words in the document as well as frequencies of words in an entire collection (e.g., TF-IDF weighting [82]).…”
Section: Document Classificationmentioning
confidence: 99%
See 1 more Smart Citation
“…A text document can be represented either in the form of binary data, when we use the presence or absence of a word in the document in order to create a binary vector. In such cases, it is possible to directly use a variety of categorical data clustering algorithms [10,41,43] on the binary representation. A more enhanced representation would include refined weighting methods based on the frequencies of the individual words in the document as well as frequencies of words in an entire collection (e.g., TF-IDF weighting [82]).…”
Section: Document Classificationmentioning
confidence: 99%
“…Traditional methods for clustering have generally focussed on the case of quantitative data [44,71,50,54,108], in which the attributes of the data are numeric. The problem has also been studied for the case of categorical data [10,41,43], in which the attributes may take on nominal values. A broad overview of clustering (as it relates to generic numerical and categorical data) may be found in [50,54].…”
Section: Introductionmentioning
confidence: 99%
“…It composed of the attribute values with high co-occurrence. In the statistical categorical clustering algorithms [12], [13] such as COOLCAT and LIMBO, data points are grouped based on the statistics. In algorithm COOLCAT, data points are separated in such a way that the expected entropy of the whole arrangements is minimized.…”
Section: Related Workmentioning
confidence: 99%
“…The K-Modes [3] algorithm is an extension of the K-means algorithm for categorical data. General description: The K-Modes algorithm was designed to group large sets of categorical data and its purpose is to obtain K-modes representing the data set and minimizing the criterion function.…”
Section: K-modes Algorithmmentioning
confidence: 99%
“…This constitutes a frequent problem in data mining applications, which work with high volumes of data. The presence of categorical data is also frequent.There are clustering algorithms [3] [4] [5] that work with large databases and categorical data, like ROCK [6] clustering algorithm, which deals with the size of databases by working with a database random sample. However, the algorithm is highly impacted by size of the sample and randomness.…”
Section: Introductionmentioning
confidence: 99%