The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Language Support for Precise Privacy-Preserving Computation

Principal Investigator: Benjamin Delaware

Protecting the privacy of personal information is not just a matter of ethics, but of law: HIPAA and FERPA, for example, restrict how an individual’s private healthcare and educational records can be used and released. These laws intersect in a delicate way with applications of modern data analyses that depend on protected data. In the healthcare setting, for example, sophisticated machine-learned models have the potential to revolutionize the diagnosis, prevention, and treatment of complex diseases like cancer. In this scenario, the data being analyzed is the very information that cannot be shared, including, potentially, with the developer of the proprietary diagnostic model.

The development of usable tools and techniques for expressing these sorts of privacy-preserving computations has been a topic of intense interest for several years, spurred in part by advances in cryptographic protocols and programming languages for secure multiparty computation (MPC). Despite recent progress, the current state of the art still suffers from some important limitations that hinder their adoption: 

  1. Systems for privacy-preserving machine learning (e.g., Microsoft Research’s CHET framework) have mainly targeted neural network–based approaches, with less consideration for alternative approaches like decision forests. A key challenge is how to support the wider variety of computational patterns that can arise in more general programs in an efficient manner.
  2. There is a considerable gap between the high-level semantic privacy requirements of the user and the guarantees provided by the underlying system. This forces the developer to ensure, for example, that the bits of information leaked by a computation are in compliance with the legal demands of HIPAA. 
  3. There is often a fundamental tension between the efficiency of a particular computation and how much private information it leaks, and the programmer is responsible for exploring these tradeoffs. In the face of complex models that depend on different types of private data with different privacy requirements, providing efficiency while guaranteeing privacy is a daunting task.

This project aims to develop techniques for building efficient, general, privacy-preserving computations that tackle these key challenges. This high-level goal is divided into three complementary thrusts:

  1. A semantics-based approach to specifying privacy guarantees for secure computations (e.g., maybe it is safe to release a patient’s lab results as long as no personally identifying information is also leaked, but if PII is leaked, other medical data cannot be), as well as language support for writing data-intensive privacy-preserving models that use these specifications. Leveraging our previous work on oblivious high-level programming languages, we automatically synthesize efficient data structures and programs for performing privacy-preserving machine learning.
  2. Automatic generation of mixed-mode computations that selectively release “safe” information to generate more efficient programs. We will build on our work for compiler optimization of FHE programs to create efficient, vectorized implementations of these programs. 
  3. An investigation of how to automatically synthesize new programs that trade off information leakage for efficiency, while still meeting the necessary privacy requirements.

Personnel

Other PIs: Milind Kulkarni

Students: Raghav Malik, Qianchuan Ye

Representative Publications

Keywords: Fully Homomorphic Encryption, Privacy Preserving Computation, Secure Multiparty Computation