2014
DOI: 10.1007/s13748-014-0048-3
|View full text |Cite
|
Sign up to set email alerts
|

Data clustering using hidden variables in hybrid Bayesian networks

Abstract: In this paper, we analyze the problem of data clustering in domains where discrete and continuous variables coexist. We propose the use of hybrid Bayesian networks with naïve Bayes structure and hidden class variable. The model integrates discrete and continuous features, by representing the conditional distributions as mixtures of truncated exponentials (MTEs). The number of classes is determined through an iterative procedure based on a variation of the data augmentation algorithm. The new model is compared … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 12 publications
(6 citation statements)
references
References 25 publications
0
6
0
Order By: Relevance
“…This idea is not new, and has been used, for instance, for unsupervised clustering [ 21 , 31 ], to improve the performance (accuracy) of the base classifier [ 32 ], relax some of the independence statements increasing the classifier modeling capability [ 33 , 34 , 35 ], or obtain models for efficient probabilistic inference [ 36 ].…”
Section: Hidden Naive Bayes For Label Rankingmentioning
confidence: 99%
See 1 more Smart Citation
“…This idea is not new, and has been used, for instance, for unsupervised clustering [ 21 , 31 ], to improve the performance (accuracy) of the base classifier [ 32 ], relax some of the independence statements increasing the classifier modeling capability [ 33 , 34 , 35 ], or obtain models for efficient probabilistic inference [ 36 ].…”
Section: Hidden Naive Bayes For Label Rankingmentioning
confidence: 99%
“…The proposed probabilistic LR-classifier relies on the use of a hybrid Bayesian network [ 21 ] where different probability distributions are used to conveniently model variables of a different nature: Multinomial for discrete variables, Gaussian for numerical variables, and Mallows for permutations [ 22 ]. The Mallows probability distribution is usually considered to model a set of permutations and, in fact, is the core of the decision tree algorithm (LRT) proposed in [ 2 ].…”
Section: Introductionmentioning
confidence: 99%
“…In this regard, the goal of an unsupervised classifier is to find groups of elements based on their similarities. In this work, we follow the methodology proposed in [31,38] (Algorithm 1), which details the specific steps and algorithms, implemented in Elvira software [39]. In this approach, the class variable C is replaced by a hidden variable, H, whose values are initially missing.…”
Section: Unsupervised Classification Using Hybrid Bnsmentioning
confidence: 99%
“…2ii). They are based on the probabilistic clustering methodology using HBNs as proposed by Fernández et al (2014), and implemented in the Elvira software (Elvira-Consortium, 2002). Fig.…”
Section: Sub-models Learningmentioning
confidence: 99%
“…A classification problem in which no information about the class variable is available (called an unsupervised classification or clustering problem) can be solved by a BN classifier (Aguilera et al, 2013;Anderberg, 1973;Fernández et al, 2014;Gieder et al, 2014). This soft-clustering methodology implies the partition of the data into groups in such a way that the observations belonging to one group are similar to each other but differ from the observations in the other groups.…”
Section: Introductionmentioning
confidence: 99%