Data Anonymization


Principal Investigator: Ninghui Li; Elisa Bertino

Agencies and other organizations often need to publish microdata, e.g., medical data or census data, for research and other purposes. While the released datasets provide valuable information to researchers, they also contain sensitive information about individuals whose privacy may be at risk. To reduce the disclosure risks, one approach is to anonymize the microdata before it is released. Research in data anonymization aims at limiting disclosure risks to an acceptable level while maximizing data utility. In this project, we study several fundamental issues in balancing the privacy with utility in microdata publishing. Some of the research directions are as follows. First, existing privacy requirements in data publishing, such as k-anonymity, l-diversity, and t-closeness, all have limitations and shortcomings in protecting attribute disclosure while preserving data utility. We work on building a robust and effective privacy requirement. Second, when the adversary has additional background knowledge about the dataset, she would be able to make more precise inference on the individuals’ sensitive attribute values. We study approaches to model the adversary’s background knowledge and techniques to prevent background knowledge attacks. Third, few existing research work studies anonymization of datasets that are continuously updated. Such a dynamic setting requires defining a new notion of privacy and proposing techniques to achieve the privacy requirement. Finally, a careful study of privacy/utility trade-off will help us better understand the whole data publishing process.

Personnel

  • Ji-Won Byun (graduated in May 2007)
  • Tiancheng Li

Keywords: microdata, privacy, utility, attack, anonymization