Panel #1: Traitor Tracing and Data Provenance (Panel Summary)
Tuesday, April 5, 2011
- David W. Baker, MITRE
- Chris Clifton, Purdue
- Stephen Dill, Lockheed Martin
- Julia Taylor, Purdue
Panel Summary by Nikhita Dulluri
In the first session of the CERIAS symposium, the theme of ‘Traitor Tracing and Data Provenance’ was discussed. The panelists spoke extensively about the various aspects relating to tracing the source of a given piece of data and the management of provenance data. The following offers a summary of the discussion in this panel.
With increasing amounts of data being shared among various organizations such as health care centers, academic institutions, financial organizations and government organizations, there is need to ensure the integrity of data so that the decisions based on this data are effective. Providing security to the data at hand does not suffice, it is also necessary to evaluate the source of the data for its trust-worthiness. Issues such as which protection method was used, how the data was protected, and whether it was vulnerable to any type of attack during transit might influence how the user uses the data. It is also necessary to keep track of different types of data, which may be spread across various domains. Identification of the context of the data usage i.e., why a user might want to access a particular piece of data or the intent of data access is also an important piece of information to be kept track of.
Finding the provenance of data is important to evaluate its trustworthiness; but this may in-turn cause a risk to privacy. In case of some systems, it may be important to hide the source of information in order to protect its privacy. Also, data or information transfer does not necessarily have to be on a file to file exchange basis- there is also a possibility that the data might have been paraphrased. Data which has a particular meaning in a given domain may mean something totally different in another domain. Data might also be given away by people unintentionally. The question now would be how to trace back to the original source of information. A possible solution suggested to this was to pay attention to the actual communication, move beyond the regions where we are comfortable and to put a human perspective on them, for that is how we communicate.
Scale is one of the major issues in designing systems for data provenance. This problem can be solved effectively for a single system, but the more one tries to scale it to a higher level, the less effective the system becomes. Also, deciding how much provenance is required is not an easy question to answer, as one cannot assume that one would know how much data the user would require. If the same amount of information as the previous transaction was provided, then one might end up providing excess (or insufficient) data than what is required.
In order to answer the question about how to set and regulate policies regarding the access of data, it is important to monitor rather than control the access to data. Policies when imposed at a higher level are good, if there is a reasonable expectation that people will act accordingly to the policy. It is important not to be completely open about what information will be tracked or monitored, as, if there is a determined attacker, this information would be useful for him to find a way around it.
The issue of data provenance and building systems to manage data provenance has importance in several different fields. In domains where conclusions are drawn based on a set of data and any alterations to the data would change the decisions made, data provenance is of critical importance. Domains such as the DoD, Health care institutions, finance, control systems and military are some examples.
To conclude, the problem of data provenance and building systems to manage data provenance is not specific to a domain or a type of data. If this problem can be solved effectively in one domain, then it can be extended and modified to provide the solution to other domains as well.