Privacy Risk and Scalability of Differentially-Private Data Anonymization

Get BibTex-formatted data

Download

PDF

Author

Mohamed R. Fouad

Tech report number

CERIAS TR 2012-10

Entry type

phdthesis

Abstract

Although data disclosure is advantageous for many obvious reasons, it may incur some risk resulting from potential security breaches. An example of such privacy violation occurs when an adversary reconstructs the original data using additional information. Moreover, sharing private information such as address and telephone number in social networks is always subject to a potential misuse. In this dissertation, we address both the scalability and privacy risk of data anonymization. We develop a framework that assesses the relationship between the disclosed data and the resulting privacy risk and use it to determine the optimal set of transformations that need to be performed before data is disclosed. We propose a scalable algorithm that meets differential privacy when applying a specific random sampling. The main contribution of this dissertation is three-fold: (i) we show that determining the optimal transformations is an NP-hard problem and propose a few approximation heuristics, which we justify experimentally, (ii) we propose a personalized anonymization technique based on an aggregate (Lagrangian) formulation and prove that it could be solved in polynomial time, and (iii) we show that combining the proposed aggregate formulation with specific sampling gives an anonymization algorithm that satisfies differential privacy. Our results rely heavily on exploring the supermodularity properties of the risk function, which allow us to employ techniques from convex optimization. Finally, we use the proposed model to assess the risk of private information sharing in social networks. Through experimental studies we compare our proposed algorithms with other anonymization schemes in terms of both time and privacy risk. We show that the proposed algorithm is scalable. Moreover, we compare the performance of the proposed approximate algorithms with the optimal algorithm and show that the sacrifice in risk is outweighed by the gain in efficiency.

Download

PDF

Date

2012 – 7 – 3

Booktitle

Privacy Risk and Scalability of Differentially-Private Data Anonymization

Institution

CERIAS, Purdue University

Key alpha

Fouad

School

Purdue University

Affiliation

Purdue University, Google Inc.

Publication Date

2012-07-03

- Privacy Risk Formalism - Framework Evaluation and Implementation - Risk-Utility-Based Algorithms for Data Disclosure - Scalability of Data Anonymization - Differential Privacy - Applying Privacy Risk Formalism on Social Networks

Subject

Data Privacy

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Phdthesis{ Fouad,
	title = "Privacy Risk and Scalability of Differentially-Private Data Anonymization",
	author = "Mohamed R. Fouad",
	year = "2012",
	month = "7",
	booktitle = "Privacy Risk and Scalability of Differentially-Private Data Anonymization",
	institution = "CERIAS, Purdue University",
	school = "Purdue University",
	day = "3",
	abstract = "Although data disclosure is advantageous for many obvious reasons, it may incur some risk resulting from potential security breaches. An example of such privacy violation occurs when an adversary reconstructs the original data using additional information. Moreover, sharing private information such as address and telephone number in social networks is always subject to a potential misuse. In this dissertation, we address both the scalability and privacy risk of data anonymization. We develop a framework that assesses the relationship between the disclosed data and the resulting privacy risk and use it to determine the optimal set of transformations that need to be performed before data is disclosed.
We propose a scalable algorithm that meets differential privacy when applying a specific random sampling.

The main contribution of this dissertation is three-fold: (i) we show that determining the optimal transformations is an NP-hard problem and propose a few approximation heuristics, which we justify experimentally, (ii) we propose a personalized anonymization technique based on an aggregate (Lagrangian) formulation and prove that it could be solved in polynomial time, and (iii) we show that combining the proposed aggregate formulation with specific sampling gives an anonymization algorithm that satisfies differential privacy. Our results rely heavily on exploring the supermodularity properties of the risk function, which allow us to employ techniques from convex optimization. Finally, we use the proposed model to assess the risk of private information sharing in social networks.

Through experimental studies we compare our proposed algorithms with other anonymization schemes in terms of both time and privacy risk. We show that the proposed algorithm is scalable. Moreover, we compare the performance of the proposed approximate algorithms with the optimal algorithm and show that the sacrifice in risk is outweighed by the gain in efficiency.",
	affiliation = "Purdue University, Google Inc.",
	contents = "- Privacy Risk Formalism
- Framework Evaluation and Implementation
- Risk-Utility-Based Algorithms for Data Disclosure
- Scalability of Data Anonymization
- Differential Privacy
- Applying Privacy Risk Formalism on Social Networks",
	subject = "Data Privacy",
}

Privacy Risk and Scalability of Differentially-Private Data Anonymization

Download

Author

Tech report number

Entry type

Abstract

Download

Date

Booktitle

Institution

Key alpha

School

Affiliation

Publication Date

Contents

Subject

BibTex-formatted data