The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Directed Infusion of Data

Principal Investigator: Hany Abdel-Khalik

The Directed Infusion of Data (DIOD) paradigm is a novel data-based obfuscation procedure developed in response to growing data privacy concerns in wake of the rise in complexity, scale, and capability of artificial intelligence and machine learning (AI/ML) tools.  General data sharing and collaboration typically requires proprietary data transfer, i.e., a stakeholder hands their data, usually in an encrypted form, to a data analyst; though all parties are generally considered trustworthy, data privacy is put at unnecessary risk simply by its distribution, thereby endangering financial resources, personal security, and proprietary/ classified material. The key issue with ensuring data privacy is that the data need to be protected while retaining their utility; many proposed methods enforce limiting conditions that avoid sharing the data but sacrifice some of its utility in doing so, thus limiting the capability of the analyst during collaboration.

The DIOD paradigm seeks to obfuscate data in an efficient manner while allowing for both data security as well as utility. By obfuscating the dynamic behavior of proprietary data with that of an unrelated dataset, the inference provided by the true data, e.g., classification, presence of anomalies, or variable dependencies, may be preserved in the new, obfuscated set of data. Using DIOD’s form of obfuscation, the data remain usable for the desired purpose, but the dynamic behavior, i.e., the ‘identity’, is changed so that proprietary details cannot be reverse engineered, thus mitigating the need to risk vital information during collaboration or outsourced computation.

A secondary benefit of DIOD is the flexibility with which the data can be masked; in order to add an additional layer of security, the structure of the data can be altered, i.e., timeseries data obfuscated as image-based data, thereby allowing data masking well-suited to the needs of a particular analysis.

Personnel

Students: Arvind Sundaram Tyler Lewis Chloe Yoder

Representative Publications

Keywords: data masking, information security, secure collaboration