A formal framework for data mining with association rules is introduced. The framework is based on a logical calculus of association rules which is enhanced by several formal tools. The enhancement allows the description of the whole data mining process, including formulation of analytical questions, application of an analytical procedure and interpretation of its results. The role of formalized domain knowledge is discussed.We deal with association rules ϕ ≈ ψ where ϕ and ψ are general Boolean attributes derived from columns of analysed data matrices. Data matrices are results of observation of a finite number of objects. Rows of a data matrix correspond to observed objects, columns correspond to observed attributes. There is a finite number of possible values (i.e. categories) of each attribute. Principles of our approach to association rules and calculi of association rules are informally introduced in sections 2.1.1 -2.1.3. Let us note that detailed description of logical calculi of association rules is provided in [23].
Boolean AttributesBasic Boolean attributes are defined first. Basic Boolean attribute is an expression A(α) where A is an attribute and α is a subset of its categories. Expressions Sex(F ) and Education(secondary, university) are examples of basic Boolean attributes, see logical calculus ST of association rules inspired by a real data set introduced in section 2.
Basic Boolean attributea value of the column of M corresponding to the attribute A in the row o. Each basic Boolean attribute is a Boolean attribute. If κ and λ are Boolean attributes then ¬κ, κ ∧ λ, and κ ∨ λ are Boolean attributes. Expression Sex(F) ∧ Education(secondary, university) and Diabetes(yes) ∨ Infarction(yes) are examples of Boolean attributes, see also section 2.2. Truthfulness of Boolean attributes ¬κ, κ ∧ λ, and κ ∨ λ is defined in a usual way. This means that ¬κ is true in a row o of a data matrix M if and only if κ is false in o, κ ∧ λ is true in o if and only if both κ and λ are true in o, and κ ∨ λ is true in o if and only if κ or λ are true in o.The symbol ≈ used in ϕ ≈ ψ is called 4ft-quantifier. A function F ≈ mapping a set of all quadruples a, b, c, d of non-negative integer numbers satisfying a + b + c + d > 0 into the set {0, 1} is associated to each 4ft-quantifier ≈. Examples of 4ft-quantifiers and their associated functions F ≈ are given in section 2.3. There are about forty 4ft-quantifiers defined and studied in [23]. Associated functions of some of them are defined such that suitable thresholds are used for various measures of interestingness of association rules as defined e.g. in [5]. The additional 4ft-quantifiers correspond to statistical hypotheses tests. The association rule ϕ ≈ ψ is true in a data matrix M if and only if it is F ≈ (a, b, c, d) = 1 where a, b, c, d = 4f t(ϕ,ψ, M) and F ≈ is the associated function of ≈. J. Rauch / Formal Framework for Data Mining with Association Rules and Domain Knowledge -Overview 175 2.1.3
. Calculus of Association RulesDeduction rules of the form ϕ≈ψ ϕ ≈ψ where ...