Efficient k-Anonymization Using Clustering Techniques

Get BibTex-formatted data

Download

PDF

Author

Elisa Bertino

Entry type

book

Abstract

k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.

Download

PDF

Date

2007

URL

http://www.springerlink.com/content/bhr261577503lx81/

Booktitle

Lecture Notes in Computer Science

Key alpha

Bertino

Pages

188-200

Publisher

Springer Berlin / Heidelberg

Volume

4443/2007

Affiliation

Purdue University

Publication Date

2007-01-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.

@Book{ Bertino,
	title = "Efficient k-Anonymization Using Clustering Techniques",
	author = "Elisa Bertino",
	year = "2007",
	booktitle = "Lecture Notes in Computer Science",
	pages = "188-200",
	publisher = "Springer Berlin / Heidelberg",
	volume = "4443/2007",
	abstract = "k-anonymization techniques have been the focus of intense research in the last few years. An important requirement for such techniques is to ensure anonymization of data while at the same time minimizing the information loss resulting from data modifications. In this paper we propose an approach that uses the idea of clustering to minimize information loss and thus ensure good data quality. The key observation here is that data records that are naturally similar to each other should be part of the same equivalence class. We thus formulate a specific clustering problem, referred to as k-member clustering problem. We prove that this problem is NP-hard and present a greedy heuristic, the complexity of which is in O(n2). As part of our approach we develop a suitable metric to estimate the information loss introduced by generalizations, which works for both numeric and categorical data.
",
	affiliation = "Purdue University",
	url = "http://www.springerlink.com/content/bhr261577503lx81/",
}