Athanasios Papagelis scite author profile

When genetic algorithms are used to evolve decision trees, key tree quality parameters can be recursively computed and re-used across generations of partially similar decision trees. Simply storing instance indices at leaves is sufficient for fitness to be piecewise computed in a lossless fashion. We show the derivation of the (substantial) expected speedup on two bounding case problems and trace the attractive property of lossless fitness inheritance to the divide-and-conquer nature of decision trees. The theoretical results are supported by experimental evidence.

show abstract

Data set operations to hide decision tree rules

Kalles

Verykios

Feretzakis

et al. 2016

View full text Add to dashboard Cite

Abstract.1 This paper focuses on preserving the privacy of sensitive patterns when inducing decision trees. We adopt a record augmentation approach for hiding sensitive classification rules in binary datasets. Such a hiding methodology is preferred over other heuristic solutions like output perturbation or cryptographic techniques -which restrict the usability of the data -since the raw data itself is readily available for public use. We show some key lemmas which are related to the hiding process and we also demonstrate the methodology with an example and an indicative experiment using a prototype hiding tool. INTRODUCTIONPrivacy preserving data mining [1] is a quite recent research area trying to alleviate the problems stemming from the use of data mining algorithms to the privacy of the data subjects recorded in the data and the information or knowledge hidden in these piles of data. Agrawal and Srinkant [2] were the first to consider the induction of decision trees from anonymized data, which had been adequately corrupted with noise to survive from privacy attacks. The generic strand of knowledge hiding research [3] has led to specific algorithms for hiding classification rules, like, for example, noise addition by a data swapping process [4]. A key target area concerns individual data privacy and aims to protect the individual integrity of database records to prevent the reidentification of individuals or characteristic groups of people from data inference attacks. Another key area is sensitive rule hiding, the subject of this paper, which deals with the protection of sensitive patterns that arise from the application of data mining techniques. Of course, all privacy preservation techniques strive to maintain data information quality.The main representative of statistical approaches [5] adopts a parsimonious downgrading technique to determine whether the loss of functionality associated with not downgrading the data, is worth the extra confidentiality. Reconstruction techniques involve the redesign of the public dataset [6][7] from the non-sensitive rules produced by algorithms like C4.5 [8] and RIPPER [9]. Perturbation based techniques involve the modification of transactions to support only non-sensitive rules [10], the removal of tuples associated with sensitive rules [11], the suppression of certain attribute values [12] and the redistribution of tuples supporting sensitive patterns so as to maintain the ordering of the rules [13].In this paper, we propose a series of techniques to efficiently protect the disclosure of sensitive knowledge patterns in classification rule mining. We aim to hide sensitive rules without 1 School of Science and Technology, Hellenic Open University, Patras, Greece, email: kalles@eap.gr, verykios@eap.gr, georgios.feretzakis@ac.eap.gr 2 Epignosis Ltd, Athens, Greece, email: papagel@efrontlearning.net compromising the information value of the entire dataset. After an expert selects the sensitive rules, we modify class labels at the tree node corresponding to the tail of the sen...

show abstract

Stable Decision Trees: Using Local Anarchy for Efficient Incremental Learning

Kalles

Papagelis

2000

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

show abstract

Algorithmic Aspects of Web Intelligent Systems

Kalles¹,

Papagelis²,

Zaroliagis

2003

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.