Differential Privacy Methods for Machine Learning and Complex Data Structures

Research Areas: Assured Identity and Privacy

Principal Investigator: Jordan Awan

As more personal data is collected and analyzed, there is a growing need for formal privacy protection. Differential privacy (DP) has arisen as the state-of-the-art method in privacy protection, but many DP methods are limited to simplistic settings and are not optimized for complex machine learning tasks. In this project, we develop and optimize DP algorithms for various machine learning tasks which can analyze complex datasets. Specifically, we develop DP methods for 1) empirical risk minimization (which encompases a wide variety of machine learning methods), 2) functional data analysis, and 3) topological data analysis.

Personnel

Other Faculty: Matthew Reimherr, Principal Research Scientist at Amazon and an Affiliate Professor of Statistics at Penn State Aleksandra Slavkovic, Professor of Statistics, Penn State Vinayak Rao, Associate Professor of Statistics, Purdue University

Students: Taegyu Kang Sehwan Kim Jinwon Sohn Ana Kenney

Representative Publications

Kang, Taegyu, Sehwan Kim, Jinwon Sohn, and Jordan Awan. "Differentially Private Topological Data Analysis." arXiv preprint arXiv:2305.03609 (2023).

Awan, Jordan, and Vinayak Rao. "Privacy-aware rejection sampling." Journal of machine learning research 24, no. 74 (2023): 1-32.

Awan, Jordan, Ana Kenney, Matthew Reimherr, and Aleksandra Slavković. "Benefits and pitfalls of the exponential mechanism with applications to hilbert spaces and functional pca." In International Conference on Machine Learning, pp. 374-384. PMLR, 2019.

Reimherr, Matthew, and Jordan Awan. "KNG: The K-norm gradient mechanism." Advances in neural information processing systems 32 (2019).

Reimherr, Matthew, and Jordan Awan. "Elliptical perturbations for differential privacy." Advances in Neural Information Processing Systems 32 (2019).

Keywords: Differential Privacy, functional data analysis, machine learning, topological data analysis