Petr Berka scite author profile

Advances in Artificial Intelligence

2011

Rule-based reasoning (RBR) and case-based reasoning (CBR) are two complementary alternatives for building knowledge-based “intelligent” decision-support systems. RBR and CBR can be combined in three main ways: RBR first, CBR first, or some interleaving of the two. The NESTsystem, described in this paper, allows us to invoke both components separately and in arbitrary order. In addition to the traditional network of propositions and compositional rules, NESTalso supports binary, nominal, and numeric attributes used for derivation of proposition weights, logical (no uncertainty) and default (no antecedent) rules, context expressions, integrity constraints, and cases. The inference mechanism allows use of both rule-based and case-based reasoning. Uncertainty processing (based on Hájek's algebraic theory) allows interval weights to be interpreted as a union of hypothetical cases, and a novel set of combination functions inspired by neural networks has been added. The system is implemented in two versions: stand-alone and web-based client server. A user-friendly editor covering all mentioned features is included.

Discretization and grouping: Preprocessing steps for data mining

Brůha

1998

Abstract. Unlike on-line discretization performed by a number of machine learning (ML) algorithms for building decision trees or decision rules, we propose off-line algorithms for discretizing numerical attributes and grouping values of nominal attributes. The number of resulting intervals obtained by discretization depends only on the data; the number of groups corresponds to the number of classes. Since both discretization and grouping is done with respect to the goal classes, the algorithms are suitable only for classification/prediction tasks. As a side effect of the off-line processing, the number of objects in the datasets and number of attributes may be reduced.It should be also mentioned that although the original idea of the discretization procedure is proposed to the KEx system, the algorithms show good performance together with other machine learning algorithms.

Empirical Comparison of Various Discretization Procedures

Int. J. Patt. Recogn. Artif. Intell.

Brůha

1998

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or finance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes.There exist quite a few discretization procedures in the ML field. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The first one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization corresponds to KEX knowledge acquisition algorithm. Since the categorization for KEX is done "off-line" before using the KEX machine learning algorithm, it can be used as a preprocessing step for other machine learning algorithms, too.The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is divided into intervals that may form a complex generated by the algorithm as a part of the class description.Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, we also used the discretization procedure of the MLC++ library. Other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.

Bachelor’s degree student dropouts: Who tend to stay and who tend to leave?

Studies in Educational Evaluation

Marek

2021

Automated knowledge acquisition for Prospector-like expert systems

Ivánek

1994

The method for automatic knowledge acquisition from categorical data is explained. Empirical implications are generated from data according to their frequencies. Only those of them are inserted to created knowledge base whose validity in data statistically significantly differs from the weight composed by the PROSPECTOR like inference mechanism from the weights of the implications already present in the base. A comparison with classical machine learning algorithms is discussed. The method is implemented as a part of the Knowledge EXplorer system.

Sentiment analysis using rule-based and case-based reasoning

2020

J Intell Inf Syst

Data Mining in Atherosclerosis Risk Factor Data

Rauch

Tomeckova

2009

The aim of this chapter is to describe goals, current results, and further plans of long-time activity concerning application of data mining and machine learning methods to the complex medical data set. The analyzed data set concerns a longitudinal study of atherosclerosis risk factors. The structure and main features of this data set, as well as methodology of observation of risk factors, are introduced. The important first steps of analysis of atherosclerosis data are described in details together with a large set of analytical questions defined on the basis of first results. Experience in solving these tasks is summarized and further directions of analysis are outlined.

Novel Phenotyping for Acute Heart Failure—Unsupervised Machine Learning-Based Approach

et al. 2022

Acute heart failure (AHF) is a life-threatening, heterogeneous disease requiring urgent diagnosis and treatment. The clinical severity and medical procedures differ according to a complex interplay between the deterioration cause, underlying cardiac substrate, and comorbidities. This study aimed to analyze the natural phenotypic heterogeneity of the AHF population and evaluate the possibilities offered by clustering (unsupervised machine-learning technique) in a medical data assessment. We evaluated data from 381 AHF patients. Sixty-three clinical and biochemical features were assessed at the admission of the patients and were included in the analysis after the preprocessing. The K-medoids algorithm was implemented to create the clusters, and optimization, based on the Davies-Bouldin index, was used. The clustering was performed while blinded to the outcome. The outcome associations were evaluated using the Kaplan-Meier curves and Cox proportional-hazards regressions. The algorithm distinguished six clusters that differed significantly in 58 variables concerning i.e., etiology, clinical status, comorbidities, laboratory parameters and lifestyle factors. The clusters differed in terms of the one-year mortality (p = 0.002) and two-year mortality (p = 0.002). Using the clustering techniques, we extracted six phenotypes from AHF patients with distinct clinical characteristics and outcomes. Our results can be valuable for future trial constructions and customized treatment.