Supervised and Unsupervised Discretization of Continuous Features

Dougherty, James; Kohavi, Ron; Sahami, Mehran

doi:10.1016/b978-1-55860-377-6.50032-3

Cited by 1,372 publications

(930 citation statements)

References 16 publications

Supporting

Mentioning

866

Contrasting

Unclassified

Order By: Relevance

“…For some tables, taking into account the small number of their objects, we have adopted the approach based on five-fold crossvalidation (CV − 5). The obtained results (Table 3) can be compared with those reported in [21,69] (Table 2). For predicting decisions on new cases we apply only decision rules generated either by the decision tree (using hyperplanes) or by rules generated in parallel with discretization.…”

Section: Feature Extraction: Discretization and Symbolic Attribute Vamentioning

confidence: 52%

“…For predicting decisions on new cases we apply only decision rules generated either by the decision tree (using hyperplanes) or by rules generated in parallel with discretization. For some tables the classification quality of our algorithm is better than that of the C4.5 or Naive-Bayes induction algorithms [100] even when used with different discretization methods [21,69,15].…”

Section: Feature Extraction: Discretization and Symbolic Attribute Vamentioning

confidence: 99%

See 1 more Smart Citation

Rough Sets and Rough Logic: A KDD Perspective

Pawlak

Polkowski

Skowron

2000

Rough Set Methods and Applications

View full text Add to dashboard Cite

Basic ideas of rough set theory were proposed by Zdzis law Pawlak [85,86] in the early 1980's. In the ensuing years, we have witnessed a systematic, world-wide growth of interest in rough sets and their applications.The main goal of rough set analysis is induction of approximations of concepts. This main goal is motivated by the basic fact, constituting also the main problem of KDD, that languages we may choose for knowledge description are incomplete. A fortiori, we have to describe concepts of interest (features, properties, relations etc.) not completely but by means of their reflections (i.e. approximations) in the chosen language. The most important issues in this induction process are:-construction of relevant primitive concepts from which approximations of more complex concepts are assembled, -measures of inclusion and similarity (closeness) on concepts, -construction of operations producing complex concepts from the primitive ones.Basic tools of rough set approach are related to concept approximations. They are defined by approximation spaces. For many applications, in particular for KDD problems, it is necessary to search for relevant approximation spaces in the large space of parameterized approximation spaces. Strategies for tuning parameters of approximation spaces are crucial for inducing concept approximations of high quality.Methods proposed in rough set approach are kin to general methods used to solve Knowledge Discovery and Data Mining (KDD) problems like feature selection, feature extraction (e.g. discretization or grouping of symbolic value),

show abstract

Section: Feature Extraction: Discretization and Symbolic Attribute Vamentioning

confidence: 52%

Section: Feature Extraction: Discretization and Symbolic Attribute Vamentioning

confidence: 99%

Rough Sets and Rough Logic: A KDD Perspective

Pawlak

Polkowski

Skowron

2000

Rough Set Methods and Applications

View full text Add to dashboard Cite

show abstract

“…Discretization has been widely studied from both a general point of view [26,27] and aimed specifically at BNs [28,29] and classification problems [30,31]. Discretization amounts to replacing a continuous variable X in a model by its discrete counterpart X 0 .…”

Section: Discretizationmentioning

confidence: 99%

Inference in hybrid Bayesian networks

Langseth

Nielsen

Rumí

et al. 2009

Reliability Engineering & System Safety

View full text Add to dashboard Cite

a b s t r a c tSince the 1980s, Bayesian networks (BNs) have become increasingly popular for building statistical models of complex systems. This is particularly true for boolean systems, where BNs often prove to be a more efficient modelling framework than traditional reliability techniques (like fault trees and reliability block diagrams). However, limitations in the BNs' calculation engine have prevented BNs from becoming equally popular for domains containing mixtures of both discrete and continuous variables (the so-called hybrid domains). In this paper we focus on these difficulties, and summarize some of the last decade's research on inference in hybrid Bayesian networks. The discussions are linked to an example model for estimating human reliability.

show abstract

“…Our encoding of ordinal features into binary features is reminiscent of machine learning algorithms for discretizing a continuous (i.e., real-valued) feature t k (see [5] for a survey and experimental comparison of less-than-recent methods, and [6,7] for two more recent surveys). These algorithms attempt to optimally subdivide the interval [α k , β k ] on which a feature t k ranges (where the interval [α k , β k ] may or may not be the same for all features in the feature space) into…”

Section: Related Workmentioning

confidence: 99%

Encoding Ordinal Features into Binary Features for Text Classification

Esuli

Sebastiani

2009

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. We propose a method by means of which supervised learning algorithms that only accept binary input can be extended to use ordinal (i.e., integer-valued) input. This is much needed in text classification, since it becomes thus possible to endow these learning devices with term frequency information, rather than just information on the presence/absence of the term in the document. We test two different learners based on "boosting", and show that the use of our method allows them to obtain effectiveness gains. We also show that one of these boosting methods, once endowed with the representations generated by our method, outperforms an SVM learner with tfidf-weighted input.

show abstract

Supervised and Unsupervised Discretization of Continuous Features

Cited by 1,372 publications

References 16 publications

Rough Sets and Rough Logic: A KDD Perspective

Rough Sets and Rough Logic: A KDD Perspective

Inference in hybrid Bayesian networks

Encoding Ordinal Features into Binary Features for Text Classification

Contact Info

Product

Resources

About