Speeding up knowledge discovery in large relational databases by means of a new discretization algorithm

Freitas, Alex A.; Lavington, Simon

doi:10.1007/3-540-61442-7_8

Cited by 11 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus they are expected to outperform previous methods especially when learning from large data. It is desirable that a machine learning algorithm maximize the information that it derives from large data sets, since increasing the size of a data set can provide a domain-independent way of achieving higher accuracy (Freitas and Lavington 1996;Provost and Aronis 1996). This is especially important since large data sets with high dimensional attribute spaces and huge numbers of instances are increasingly used in real-world applications, and naive-Bayes classifiers are particularly attractive to theses applications because of their space and time efficiency.…”

Section: Resultsmentioning

confidence: 99%

Discretization for naive-Bayes learning: managing discretization bias and variance

2008

View full text Add to dashboard Cite

Quantitative attributes are usually discretized in Naive-Bayes learning. We establish simple conditions under which discretization is equivalent to use of the true probability density function during naive-Bayes learning. The use of different discretization techniques can be expected to affect the classification bias and variance of generated naive-Bayes classifiers, effects we name discretization bias and variance. We argue that by properly managing discretization bias and variance, we can effectively reduce naive-Bayes classification error. In particular, we supply insights into managing discretization bias and variance by adjusting the number of intervals and the number of training instances contained in each interval. We accordingly propose proportional discretization and fixed frequency discretization, two efficient unsupervised discretization methods that are able to effectively manage discretization bias and variance. We evaluate our new techniques against four key discretization methods for naive-Bayes classifiers. The experimental results support our theoretical analyses by showing that with statistically significant frequency, naive-Bayes classifiers trained on data discretized by our new methods are able to achieve lower classification error than those trained on data discretized by current established discretization methods.

show abstract

Section: Resultsmentioning

confidence: 99%

Discretization for naive-Bayes learning: managing discretization bias and variance

2008

View full text Add to dashboard Cite

show abstract

“…Then, using the statistical χ 2 test, the pair with the lowest χ 2 value is merged into one interval, and this process is repeated until no intervals have a lower value than the predetermined threshold value of χ 2 (Kerber 1992;Freitas and Lavington 1996).…”

Section: Chimergementioning

confidence: 99%

A Rough Set Based Model in Water Quality Analysis

Pai

Lee

2009

Water Resour Manage

View full text Add to dashboard Cite

Due to pollution caused by the expansion of human activities and economic development, water quality has gradually deteriorated in many areas of the world. Therefore, analysis of water quality becomes one of the most essential issues of modern civilization. Integrated interdisciplinary modeling techniques, providing reliable, efficient, and accurate representation of the complex phenomenon of water quality, have gained attention in recent years. With the ability to deal with both numeric and nominal information, and express knowledge in a rule-based form, the Rough Set Theory (RST) has been successfully employed in many fields. However, the application of RST has not been widely investigated in water quality analysis. The reducts generated by RST models become very time-consuming as the size of the problem increases. Using multinomial logistics regression (MLR) techniques to provide reducts of RST models, this investigation develops a hybrid Multinomial Logistic Regression and Rough Set Theory (MLRRST) model to analyze relations between degrees of water pollution and environmental factors in Taiwan. Empirical results indicate that the MLRRST model could analyze water qualities efficiently and accurately, and yield decision rules for the staff of water quality management. Thus, the proposed model is a promising and helpful scheme in analyzing water quality.

show abstract

“…We have chosen supervised techniques because using classi cation information we can reduce the probability of grouping di erent classes in the same AL96], an information-theoretic algorithm, that substitutes ChiMerge /StatDisc statistical measures with an information loss function in a bottom-up iterative process. This approach is similar to C4.5 local discretization process but in order to apply it into a global algorithm a correction factor need to be used.…”

Section: Discretization Algorithmmentioning

confidence: 99%

Application of Rough Sets Algorithms to Prediction of Aircraft Component Failure

Peña

Létourneau²,

Famili³

1999

Advances in Intelligent Data Analysis

View full text Add to dashboard Cite

Abstract. This paper presents application of Rough Sets algorithms to prediction of component failures in aerospace domain. To achieve this we rst introduce a data preprocessing approach that consists of case selection, data labeling and attribute reduction. We also introduce a weight function to represent the importance of predictions as a function of time before the actual failure. We then build several models using rough set algorithms and reduce these models through a postprocessing phase. End results for failure prediction of a speci c aircraft component are presented.

show abstract

Speeding up knowledge discovery in large relational databases by means of a new discretization algorithm

Cited by 11 publications

References 7 publications

Discretization for naive-Bayes learning: managing discretization bias and variance

Discretization for naive-Bayes learning: managing discretization bias and variance

A Rough Set Based Model in Water Quality Analysis

Application of Rough Sets Algorithms to Prediction of Aircraft Component Failure

Contact Info

Product

Resources

About