TC: Small: Collaborative Protocols for Privacy-Preserving Scalable Record Matching and Ontology Alignment

Page Content

Principal Investigator: Elisa Bertino

Many application domains, such as intelligence, counter-terrorism, forensics, disease control, often need to cross-match multiple very large datasets, such as watch lists. Because those datasets may contain privacy-sensitive or confidential information, the use of efficient privacy-preserving protocols for cross-matching different datasets is crucial. The problem of privacy-preserving record  matching has been addressed by the use of Secure Multi-party Computation (SMC) protocols. Under these protocols, the data are converted to series of functions with private inputs. However a major drawback of SMC-based protocols is that they involve extensive cryptographic primitives such as homomorphic encryption which do not scale to the size of practical problems. As a result, SMC-based protocols cannot be used for resource constrained data-intensive privacy-preserving record matching approaches directly. This project develops a novel approach based on the observation that to apply SMC to practical applications, one needs to bridge the gap  between the size of the datasets that can efficiently be matched using  SMC protocols and the size of the datasets seen in practice. The approach taken by the project tackles the problem from a novel angle by developing techniques to reduce the size of practical problems by employing privacy-preserving data sanitization methods. The project  thus solves the privacy-preserving data matching problems through the following steps. First, to protect the privacy of data subjects, useful statistics about data is gathered using differential privacy. Second, differentially private statistics are shared among the parties involved in data matching. These parties then identify potential matching pairs where fruitful matching may occur. Such a step is referred to as data blocking. Finally, SMC techniques are applied to these candidates to accurately cross-match
information. In addition to syntactic matching, semantic matching is supported by which records are compared according to some semantic similarity functions.


The semantic matching protocols includes techniques for matching and aligning ontologies, as the use of ontologies is crucial for an effective semantic matching. This project is the first to use differential privacy for efficient privacy-preserving record matching that also leverages semantics-based approach and a privacy-preserving approach to ontology alignment. The techniques developed in the project are the first to achieve efficient privacy-preserving matching of large scale data sets using differential privacy, thus overcoming  the scalability problems of conventional SMC techniques. The approach developed in this project expands the opportunities and contexts for data use by enabling the cross-match of multiple data archives, possibly owned by different parties, without violating the privacy of the data. Many applications, of interest  for our society, will benefit by such opportunities.

Keywords: Differential Privacy, Privacy, Record Matching, SMC Protocols