Synchronizing source and target databases is an important task in many database applications. There are instances when the synchronization of source and target databases must be driven from the target's side and involve no changes to the source's schema or triggers. We describe an algorithm for such synchronization. Our algorithm groups tuples into partitions and compares hashes of matching source and target partitions before synchronizing only those partitions whose hashes do not match. The hash comparisons decrease the number of tuples that must be exchanged when the source and target are nearly synchronized already. Two variants of full replication that differ on locking strategies are used as benchmarks. Empirical results show that our method outperforms both when there are few changes to the database and outperforms row-level locking when fewer than 70% of the partitions are changed.
We consider a special case in association rule mining where mining is conducted by a third party over data located at a central location that is updated from several source locations. The data at the central location is at rest while that flowing in through source locations is in motion. We impose some limitations on the source locations, so that the central target location tracks and privatizes changes and a third party mines the data incrementally. Our results show high efficiency, privacy and accuracy of rules for small to moderate updates in large volumes of data. We believe that the framework we develop is therefore applicable and valuable for mining big data.
This chapter gives a synopsis of the techniques that exist in the area of privacy preserving data mining. Privacy preserving data mining is important because there is a need to develop accurate data mining models without using confidential data items in individual records. In providing a neat categorization of the current algorithms that preserve privacy for major data mining tasks, the authors hope that students, teachers and researchers can gain an understanding of this vast area and apply the knowledge gained to find new ways of simultaneously preserving privacy and conducting mining.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.