CERIAS - Center for Education and Research in Information Assurance and Security

Skip Navigation
CERIAS Logo
Purdue University
Center for Education and Research in Information Assurance and Security

Privacy-Preserving Analysis with Applications to Textual Data

Download

Download PDF Document
PDF

Author

Balamurugan Anandan

Tech report number

CERIAS TR 2017-01

Entry type

phdthesis

Abstract

Textual data plays a very important role in decision making and scientific research, but cannot be shared freely if they contain personally identifiable information. In this dissertation, we consider the problem of privacy-preserving text analysis, while satisfying a strong privacy definition of differential privacy. We first show how to build a two-party differentially private secure protocol for computing similarity of text in the presence of malicious adversaries. We then relax the utility requirement of computational differential privacy to reduce computational cost, while still giving security with rational adversaries. Next, we consider the problem of building a data-oblivious algorithm for minimum weighted matching in bipartite graphs, which has applications to computing secure semantic similarity of documents. We also propose a secure protocol for detecting articulation points in graphs. We then relax the strong data-obliviousness definition to give $\epsilon$-data-obliviousness based on the notion of indistinguishability, which helps us to develop efficient protocols for data-dependent algorithms like frequent itemset mining. Finally, we consider the problem of privacy-preserving classification of text. A main problem in developing private protocols for unstructured data is high dimensionality. This dissertation tackles high dimensionality by means of differentially private feature selection. We show that some of the well known feature selection techniques perform poorly due to high sensitivity and we propose techniques that perform well in a differential private setting. The feature selection techniques are empirically evaluated using differentially private classifiers: na\"{i}ve Bayes, support vector machine and decision trees.

Download

PDF

Date

2017 – 5 – 1

Key alpha

Anandan

Publication Date

2017-05-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.