Privacy-Preserving Distributed Data Mining And Processing On Horizontally Partitioned Data
Tech report number
CERIAS TR 2005-51
Data mining can extract important knowledge from large data collections, but sometimes these collections are split among various parties. Data warehousing, bringing data from multiple sources under a single authority, increases risk of privacy violations. Furthermore, privacy concerns may prevent the parties from directly sharing even some meta-data. Distributed data mining and processing provide a means to address this issue, particularly if queries are processed in a way that avoids the disclosure of any information beyond the final result. This thesis presents methods to mine horizontally partitioned data without violating privacy and shows how to use the data mining results in a privacy-preserving way. The methods incorporate cryptographic techniques to minimize the information shared, while adding as little as possible overhead to the mining and processing task.
2005 – 08
LIST OF TABLES LIST OF FIGURES ABBREVIATIONS ABSTRACT 1 Introduction 2 Privacy-preserving Data Mining: State-of-the-art and Related Issues 3 General Secure Multi-party Computation and Cryptographic Tools 4 Privacy-preserving Distributed Association Rule Mining 5 Privacy-preserving Distributed k-Nearest Neighbor classification 6 Privacy-preserving Distributed Naive Bayes Classifier 7 When do Data Mining Results Violate Privacy? 8 Using Decision Rules for Private Classification 9 Summary LIST OF REFERENCES VITA
Data Mining, Privacy