CERIAS - Center for Education and Research in Information Assurance and Security

Skip Navigation
Purdue University
Center for Education and Research in Information Assurance and Security

Privacy Risk and Scalability of Differentially-Private Data Anonymization


Download PDF Document


Mohamed R. Fouad

Tech report number

CERIAS TR 2012-10

Entry type



Although data disclosure is advantageous for many obvious reasons, it may incur some risk resulting from potential security breaches. An example of such privacy violation occurs when an adversary reconstructs the original data using additional information. Moreover, sharing private information such as address and telephone number in social networks is always subject to a potential misuse. In this dissertation, we address both the scalability and privacy risk of data anonymization. We develop a framework that assesses the relationship between the disclosed data and the resulting privacy risk and use it to determine the optimal set of transformations that need to be performed before data is disclosed. We propose a scalable algorithm that meets differential privacy when applying a specific random sampling. The main contribution of this dissertation is three-fold: (i) we show that determining the optimal transformations is an NP-hard problem and propose a few approximation heuristics, which we justify experimentally, (ii) we propose a personalized anonymization technique based on an aggregate (Lagrangian) formulation and prove that it could be solved in polynomial time, and (iii) we show that combining the proposed aggregate formulation with specific sampling gives an anonymization algorithm that satisfies differential privacy. Our results rely heavily on exploring the supermodularity properties of the risk function, which allow us to employ techniques from convex optimization. Finally, we use the proposed model to assess the risk of private information sharing in social networks. Through experimental studies we compare our proposed algorithms with other anonymization schemes in terms of both time and privacy risk. We show that the proposed algorithm is scalable. Moreover, we compare the performance of the proposed approximate algorithms with the optimal algorithm and show that the sacrifice in risk is outweighed by the gain in efficiency.




2012 – 7 – 3


Privacy Risk and Scalability of Differentially-Private Data Anonymization


CERIAS, Purdue University

Key alpha



Purdue University


Purdue University, Google Inc.

Publication Date



- Privacy Risk Formalism - Framework Evaluation and Implementation - Risk-Utility-Based Algorithms for Data Disclosure - Scalability of Data Anonymization - Differential Privacy - Applying Privacy Risk Formalism on Social Networks


Data Privacy

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.