2016
DOI: 10.1109/tnnls.2015.2436432
|View full text |Cite
|
Sign up to set email alerts
|

A New Distance Metric for Unsupervised Learning of Categorical Data

Abstract: Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
23
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 86 publications
(24 citation statements)
references
References 34 publications
0
23
0
Order By: Relevance
“…The result illustrated the presented approach offered better generalization. A new distance metric for processing the categorical data was utilized in this work by using an unsupervised learning technique [29]. Also, various distance metrics have been investigated in this work, which included hamming distance, modified value difference metric, Ahmad's distance metric, association based distance metric, and content-based distance metric.…”
Section: Clusteringmentioning
confidence: 99%
“…The result illustrated the presented approach offered better generalization. A new distance metric for processing the categorical data was utilized in this work by using an unsupervised learning technique [29]. Also, various distance metrics have been investigated in this work, which included hamming distance, modified value difference metric, Ahmad's distance metric, association based distance metric, and content-based distance metric.…”
Section: Clusteringmentioning
confidence: 99%
“…The main drawback of metrics based on co-occurrence is the assumption of an intrinsic dependency between attributes without considering their relevance. The work presented by Ienco, Pensa & Meo (2012) and Jia, Cheung & Liu (2015) use the notion of contexts to evaluate pairs of categories. A context is an additional dimension used to determine the similarity between pairs.…”
Section: Patient Similarity and Distance Measures For Categorical Eventsmentioning
confidence: 99%
“…This approach yields three main kinds of distance relation. One is based on probability, which includes similarity relations that are information-theoretic centered, for example [2][3][4][5][6]; the next is based on the attribute space, for example [7][8][9]; and the other amounts to a specialization of a standard measure, such as Euclidean or Manhattan distance. All these measures overlook attribute interdependence, which, as noted in [10], may provide valuable information when capturing per-attribute object similarity.…”
Section: Introductionmentioning
confidence: 99%