CERIAS - Center for Education and Research in Information Assurance and Security

Skip Navigation
Purdue University
Center for Education and Research in Information Assurance and Security

Trustworthy Data From Untrusted Services

Research Areas: End System Security

Principal Investigator: Sunil Prabhakar

Increasingly, data are subjected to environments which can result in invalid (malicious or inadvertent) modifications to the data. Such possibilities clearly arise when we host our data in a cloud computing setting where we lack complete control over the hardware and software running at the cloud servers. They can also arise when the data is maintained on trusted servers, but the data may get modified by a malicious insider or an intruder that manages to compromise the server or the communication channels. In these situations, can we be ensured that data retrieved from an untrusted server are trustworthy (i.e., the data and retrieved values have not been tampered or incorrectly modified)?

The main goal of this project is to provide exactly this capability. We aim to develop protocols and tools that enhance our ability to establish the trustworthiness of data by ensuring the authenticity and integrity of queries and updates over structured data. Our work aims to reduce the level of trust of the servers (cloud or otherwise) necessary to ensure that the data and query results are trustworthy -- i.e., without tampering or error.

The proposed work is applicable in a number of different scenarios including the following two prominent ones: 1) Ensuring that data maintained at a server have not been tampered, while allowing legitimate updates to be applied. 2) Ensuring the correctness of data retrievals and updates applied to data hosted in a cloud environment where the data owner has no direct control.

Although cloud computing holds great promise, it raises a number of security and privacy concerns. In particular, since the clients have little or no direct control over the software and hardware that is running at the servers, there is a reluctance to blindly trust the server. While cloud service providers are not likely to be malicious, a server may sacrifice the integrity and validity of a client application or dataset either intentionally (e.g., to use resources for other clients), or inadvertently (e.g., due to a software error, hardware failure, lack of proper policies, or incompetence). There is also the concern about the server being attacked by an external entity that can corrupt the outsourced data or service.

Ensuring the integrity and authenticity of data is of ever increasing importance as data is generated
by multiple sources, often outside the direct control of fully trusted entities. Subsequent to their initial generation data may be corrupted or tampered by entities either maliciously or inadvertently.
We propose to develop protocols that provide provable assurance about the authenticity and
integrity of structured databases. In particular, we focus on the most common formats for structured data: relational databases, XML, and simple tabular data.

Our proposed solutions have an immediate and highly desirable benefit: our protocols for ensuring integrity and authenticity can also provide authentic provenance information at no extra effort or cost. Thus, not only can we ensure that the data are not invalid, we can also use the structures for ensuring this validity to track the changes that have been applied to the data.  In other words, our solutions can provide assured provenance for data.

Our solutions will provide guarantees for both consumers of data from untrusted services and the service providers too. In particular, they provide indemnity for the server. This is important in cloud computing as it will protect an honest cloud service provider from false claims by malicious users, while proving the fidelity of the hosted database.


Students: Rohit Jain Romila Pradhan

Keywords: authentic outsourcing, authenticity, Cloud computing, databases, indemnity, integrity, outsourcing, provenance, trustworthy data