Page Content

A Cautionary Incident


Recently, Amazon's cloud service failed for several customers, and has not come back fully for well over 24 hours. As of the time I write this, Amazon has not commented as to what caused the problem, why it took so long to fix, or how many customers it affected.

It seems a client of Amazon was not able to contact support, and posted in a support forum under the heading "Life of our patients is at stake - I am desperately asking you to contact." The body of the message was that "We are a monitoring company and are monitoring hundreds of cardiac patients at home. We were unable to see their ECG signals"

What ensued was a back-and-forth with others incredulous that such a service would not have a defined disaster plan and alternate servers defined, with the original poster trying to defend his/her position. At the end, as the Amazon service slowly came back, the original poster seemed to back off from the original claim, which implies either an attempt to evade further scolding (and investigation), or that the original posting was a huge exaggeration to get attention. Either way, the prospect of a mission critical system depending on the service was certainly disconcerting.

Personnel from Amazon apparently never contacted the original poster, despite that company having a Premium service contract.

25 or so years ago, Brian Reid defined a distributed system as "...one where I can't get my work done because a computer I never heard of is down." (Since then I've seen this attributed to Leslie Lamport, but at the time heard it attributed to Reid.) It appears that "The Cloud" is simply today's buzzword for a distributed system. There have been some changes to hardware and software, but the general idea is the same — with many of the limitations and cautions attendant thereto, plus some new ones unique to it. Those who extol its benefits (viz., cost) without understanding the many risks involved (security, privacy, continuity, legal, etc.) may find themselves someday making similar postings to support fora — as well as "position wanted" sites.

The full thread is available here.

Panel #4: Securing Web 2.0 (Panel Summary)


Wednesday, April 6, 2011

Panel Members:

  • Gerhard Eschelbeck, Webroot
  • Lorraine Kisselburgh, Purdue
  • Ryan Olson, Verisign
  • Tim Roddy, McAfee
  • Mihaela Vorvoreanu, Purdue

Panel Summary by Preeti Rao

The panel was moderated by Keith Watson, Research Engineer, CERIAS, Purdue University

Keith kick-started the panel with an interesting introduction to the term Web 2.0. He talked about how he framed its definition, gathering facts from Wikipedia, Google searches, comments and likes from Facebook, tweets from Twitter while playing Farmville, Poker on the Android phone!

All the panelists gave short presentations on Web 2.0 security challenges and solutions. These presentations introduced the panel topic from different perspectives - marketing, customer demands, industry/market analysis, technological solutions, academic research and user education.

Mihaela Vorvoreanu from Purdue University, who gave the first presentation, chose to use Andrew McAfee’s definition of Enterprise 2.0: a set of emerging social software collaborative platforms. She noted that the emphasis is on the word “platform” as opposed to “communication channels” because platforms are public and they support one-to-one communication which is public to all others, thus making it many-to-many communication.

She talked about the global study on Web 2.0 use in organizations which was commissioned by McAfee Inc, and reported by faculty at Purdue University. This study defined Web 2.0 to include consumer social media tools like Facebook, Twitter, YouTube and Enterprise 2.0 platforms. The study was based on a survey of over 1000 CIOs and CEOs in 17 countries, sample balanced by country, organization size, industry sector. The survey results were complimented with in-depth interviews with industry experts, analysts, academicians to get a comprehensive view of Web 2.0 adoption in organizations globally, its benefits and security concerns. While overall organizations reported great benefits and importance to using Web 2.0 in several business operations, the major concern was security - reported by almost 50% of the respondents. In terms of security vulnerabilities, social networking tools were reported to be the top threat followed by Webmail, content sharing sites, streaming media sites and collaborative platforms. Specific threats that organizations perceive from employee use of Web 2.0 included malware, virus, information over-exposure, spyware, data leaks. 70% of the respondents had security incidents in the past year and about 2 million USD were lost due to security incidents. The security measures reported by organizations included firewall protection, web filtering, gateway filtering, authentication and social media policies.

She presented a broad, global view of organizational uses, benefits and security concerns of Web 2.0.

Lorraine Kisselburgh from Purdue University continued to present the results from McAfee’s report. She discussed an interesting paradox that the study found.

Overall, there is a positive trend with significant adoption rate (75%) of Web 2.0 tools world-wide. There are also significant concerns among those who haven’t adopted the technology. 50% of non adopters report security concerns, followed by productivity, brand and reputation concerns. Not all tools have the same perceived value or even same concerns/risks/threats. Social networking tools and streaming media sites are considered most risky. Nearly half of the organizations banned Facebook. 42% banned IM, 38% banned YouTube. Collaborative platforms and content sharing tools are considered as less risky and their perceived value/usefulness is high when compared to social tools. But survey of those organizations who have adopted report the real value of social tools to be quite high - helpful in increasing communication, improving brand marketing etc. In fact social tools realized greater value than webmail etc.

So, the paradox is: social tools (social networking and streaming media sites) are mostly considered highly risky from a security standpoint, perceived least valuable to organizations, but yet they realize great value among adopters.

This reflects the continuing tensions between how the value of social media tools is perceived vs realized by organizations. This is also in-line with some historical trends in adopting new/unknown, emerging technologies. Example: email. The tensions are also because of where the technology is located and where to address risk: internal tools vs external on the cloud. It also has to do with recognizing organizational tools vs people tools.

Tim Roddy from McAfee addressed his comments on Web 2.0 security from a buying organization standpoint, giving it a product marketing perspective, about selling web security solutions. He commented that initially people were concerned about malware coming in to the organizations through email. Now the model and dynamics have changed and it has an influence on how we investigate our products and how we see our customers using security solutions from a business standpoint. His comments focussed on two areas: 1) stopping malicious software from coming in 2) having customizable controls for people using social media tools.

He pointed out that about 3 years ago, his customers were using their products to block access to sites like Twitter, Facebook because they saw no value in using them in businesses. But periodic McAfee surveys show a dramatic change in this trend. Organizations are allowing access to these tools; this trend is also driven by the younger generation of employees in the organizations demanding access. While it was a URL filtering solution that was used 3 years back to just block for eg, social networking sites category, now it is changed because they allow access to those websites.

So, how do we allow safe productive access?

There is a dramatic increase/acceleration in malware; they are automated, targeted and smarter now. Therefore web security efforts need to be proactive. By proactive security, it means not only to stop malware with signature analysis but include effective behavioral analysis to break the chains/patterns of attacks. McAfee’s Gateway Anti-Malware strategies focus on these.

Secondly, organizations allow access to social media tools now; but no one filters the apps in those tools to make sure they are legitimate. For eg: are the game apps on Facebook legitimate and secure? Such apps are one of the most common ways of attacks. The solution is to customize controls. Industries, especially finance and healthcare, are worried about leakage of data. Say, an employee sends his SSN through a LinkedIn message. Can it be blocked/filtered? Security solution efforts are now bi-directional – to proactively monitor and filter what is coming in as malware and what is going out as data leakage.

Lastly, the security concerns for use of mobile/handheld devices are growing. There is a great need to secure these devices, especially if corporately owned. It needs to have the same level of regulations and be compliant to corporate network standards.

Gerhard Eschelbeck from Webroot talked about why securing Web 2.0 a big deal and how we got there.

First gen of web apps were designed for static content to be displayed by browser. All execution processing was on server side and mostly trusted content. There were no issues about client/side browser side execution so the number of attacks happening was significantly less. The only worry then was to protect the servers. Now, the security concerns are mainly because of interactive content in Web 2.0. Fundamentally the model changes from 1-way-data from server to client to 2-way interactive model. Browser has become part of this execution environment. Billions of users’ browsers that are a part of this big ecosystem are exposed to attacks.

There is a major shift from code execution purely on server-side to distributed model of code execution using ajax and interactive, dynamic client side web page executions. While useful in many ways, it introduces new vulnerabilitie and this is the root cause for Web 2.0 security concerns.

He highlighted four areas of concerns:

  1. User created, user defined content which is not trusted content
  2. To bring desktop look and feel to the Web 2.0 applications, interactive features like mouse rollovers, popups have caused significant amount of interaction between server and client and this causes more vulnerabilities
  3. Syndication of content and mashups of various sites
  4. Offline capabilities of some applications now lead to storage of information on one of those billions of desktops

All these have led to increased security exposure points in turn leading to vulnerabilities.

Ryan Olson from Verisign talked about malware issues with Web 2.0.People are sharing a lot of their personal information online which they weren’t doing earlier. Access to personal information of people has become easy now, and is available to friends on social networks, or even anyone who has access to that friend’s account. A lot of organizations now have started using a security question/answer as a form of authentication after login/password. Answers to questions like user’s mother’s maiden name or high school name can be easily found on social networking sites. Most of such questions can be answered by looking at the user’s personal data that is available online, often without much authentication. This way Web 2.0 offers more vectors for malware. It offers many ways of communicating with people hence opening up to a lot of new entry points that we now need to monitor. Earlier it was mostly email and IM but now each of these social networks allow an attacker to send message, befriend and build trust. There are additional avenues provided by these tools to social-engineer the user into revealing some information about self, by exploiting the trust between user and his friends. A lot of malware are successful purely through social engineering attacks, by befriending them or enticing them and then extracting information. Primary solution to this problem is to educate people about the consequences of revealing personal information and the value of trust.

Questions from audience and discussions with the panel:

Keith Watson: How much responsibility should be held with the Web 2.0 providers (organizations like Facebook, Twitter) in providing secure applications? How much responsibility should be held with the users and educating them about safe usage? Is there a balance between user education and application provider responsibility?


TR: Just like any application provider, the companies do have a lot of responsibility; but educating the users is also equally important. Users are putting so much information out on the Web (for eg: Oh, I am in the airport). People should be made to realize how much and what to share.

RO: It should be a shared responsibility. It is the market that drives Web 2.0 to become more secure. For example, the competition between social network providers to provide a malware-free, secure application drives everything. If one social network is not as secure then users will just migrate to the next one. This way market will help and continue to put pressure on people in turn the providers to make secure applications.

LK: While it has to be a shared responsibility, it also has to do with recognizing the value of social media tools and encouraging its participation in businesses. Regarding user education, what we have found in some privacy research is that understanding the audience of these tools - who has access, what are they accessing, to whom are you disclosing, and being able to visualize who is listening helps the users in deciding what and how much information to disclose. Framing this through technology, system design would be helpful from an educational standpoint.

MV noted that there could be unintended, secondary audience always listening. She took a cultural approach to explain/understand social media tools. Each tool may be viewed as a different country – Facebook is a country, Twitter is another country. Just like how people from one country aren’t familiar with another country’s culture, and they may use travel guidebooks, travel information for help, users of social media tools need to be educated about the different social media tools and their inherent cultures.

GE: While the tourism and travel industry comparison is good, it doesn’t quite work always in the cyberworld because it is different. There is no differentiation anymore between dark and bright corners; even a site which “looks” safe might be a target of an awful attack Educational element is important but the technological safety belt is much needed. Securing is also hard for the fact that server-side component is usually from provider but client-side/browsers are with the people. It is important how we provide browser protection to users and reduce Web 2.0 attacks.

Brent Roth: What are your thoughts on organizations adopting mechanisms/models like the “no script add- on in Firefox”?


RO: This model would work really well for people who have some security knowledge/background, but doesn’t work for a common man. We need to look at smarter models for general public that make decisions about good and bad by putting the user in the safety belt.

TR: Websites get feeds and ads. While some may be malicious, they also drive the revenue. McAfee’s solutions block parts of the sites/pages which could be malicious. Behavioral analysis techniques help. It has to be a granular design solution.

RO: If all scripts are blocked then what about the advertisers? If we block all advertisers, the Internet falls because they drive the revenue. Yes, a lot of malware comes from ads and scripts but you cannot just completely block everything.

Malicious script analytics, risk profiling need to be done. The last line of defense is always at the browser end. User education is as important as having a technology safety belt to secure Web 2.0.

Panel #3: Fighting Through: Mission Continuity Under Attack (Panel Summary)


Tuesday, April 5, 2011

Panel Members:

  • Paul Ratazzi, Air Force Research Laboratories
  • Saurabh Bagchi, Purdue
  • Hal Aldridge, Sypris Electronics
  • Sanjai Narain, Telcordia
  • Cristina Nita-Rotaru, Purdue
  • Vipin Swarup, MITRE

Panel Summary by Christine Task

In Panel #3: “Fighting Through: Mission Continuity Under Attack”, each of the six panelists began by describing their own perspective on the problem of organizing real-time responses and maintaining mission continuity during an attack. They then addressed three questions from the audience.

Paul Ratazzi offered his unique insight as the technical advisor for the Cyber Defense and Cyber Science Branches at the Air Force Research Laboratory in Rome, NY. He noted that military organizations are necessarily already experienced at “guaranteeing mission essential functions in contested environments” and suggested that the cyber-security world could learn from their general approach. He divided this approach into four stages: Avoid threats (including hardening systems, working on information assurance, and minimizing vulnerabilities in critical systems), survive attacks (develop new, adaptive, real-time responses to active attacks), understand attacks (forensics), and recover from attacks (build immunity against similar future attacks). Necessary developments to meet these guidelines are improved understanding of requirements for critical functions (systems engineering) and real-time responses that go beyond our current monitor/detect/respond pattern. As a motivation for the latter, he gave the example of a fifth generation fighter, nicknamed a ‘flying network’. When its technological systems are under attack, looking through the log file afterwards is “too little, too late”.

Dr. Saurabh Bagchi of CERIAS and the Purdue School of Electrical and Computer Engineering described an innovative NSF-funded research project which offered real-time responses to attacks on large-scale, heterogeneous distributed systems. These systems involve a diverse array of third-party software and often offer a wide variety of vulnerabilities to an attacker. Additionally, attacks across these systems can spread incredibly quickly using trust relationships and privilege escalation, eventually compromising important internal resources. Any practical reaction must occur in machine-time. Dr. Bagchi’s research chose the following strategies: Use bayesian-inference to guess which components are currently compromised at a given time, and from that information estimate which are most likely to be attacked next. Focus monitoring efforts on those components precieved as at risk. Use knowledge of the distributed system to estimate the severity of the attack in progress, and respond appropriately with real-time containment steps such as randomizing configurations or restricting access to resources. Finally, he emphasized the importance of learning from each attack. Long-term responses should abstract the main characteristics of the attack and prepare defenses suited to any similar attacks in the future.

Dr. Sanjai Narain, a Senior Research Scientist in Information Assurance and Security at Telcordia Research, described his own work on distributed systems defense—a novel, concrete solution for the type of immediate containment suggested by Dr. Bagchi. Although the high-level abstraction of a network as a graph is relatively straightforward, the actual configuration space can be incredibly complex with very many variables to set at each node. ConfigAssure is an application which eliminates configuration errors by using SAT constraint solvers to find configurations which satisfy network specifications. For any given specification, there are likely many correct configurations. In order to successfully attack a network, an attacker must gain some knowledge of its layout (such as the location of gateway routers). By randomizing the network configuration between different correct solutions to the specification, an attacker can be prevented from learning anything useful about the network while the users themselves remain unaware of any changes.

Dr. Cristina Nita-Rotaru, an Assistant Director of CERIAS and an Associate Professor in the Department of Computer Science at Purdue, introduced an additional concern with maintaining mission continuity: maintaining continuity of communication. She offered the recent personal example of having her credit cards compromised while traveling. She was very quickly informed of this problem by her credit card companies and was thus able to make a risk-assessment of the situation and form a reasonable response (disabling one card while continuing to use the less vulnerable one until she could return home). When an attack compromises channels of communication, for example by taking out the network which would be used to communicate—as in jamming wireless networks, the information necessary to make a risk-assessment and form containment strategies is not available. Thus when considering real-time reactions to attacks, it’s important to make sure the communication network is redundant and resilient.

Dr. Hal Aldridge, the Director of Engineering at Sypris Electronics and a previous developer of unmanned systems for space and security applications at Northrop Grumman and NASA, discussed the utility of improving key-management systems to respond to real-time attacks. Key management systems which are agile and dynamic can help large organizations react immediately to threats. In a classic system with one or few secrets which are statically set, the loss of a key can be catastrophic. However, a much more robust solution is a centralized cryptographic key management system which uses a large, accurate model of the system to enable quickly changing potentially compromised keys, or using key changes to isolate potentially compromised resources. He briefly described his work on such a system.

Dr. Vipin Swarup, Chief Scientist for Mission Assurance Research in MITRE’s Information Security Division, emphasized one final very important point about real-time system defense: high-end threats are likely to exist inside the perimeter of the system. Our ability to prevent predictable low-end threats from entering the perimeter of our systems is reasonably good. However, we must also be able to defend against strategic, targeted, adaptive attacks which are able to launch from inside our security system. In this case, as the panel has discussed, the key problem is resiliency; we must be able to launch our real-time response from within a compromised network. Dr. Swarup summarized three main guidelines for approaching this problem: reduce threats (by deterring and disrupting attackers), reduce vulnerabilities (as Ratazzi described, understand system needs and protect critical resources), and reduce consequences (have a reliable response). Any real-time response strategy must take into account that the attacker will also be monitoring and responding to the defender, must be able to build working functionality on top of untrusted components, and must have a more agile response-set than simply removing compromised components.

After these introductions, there was time to address three questions to the panel [responses paraphrased].

“What time-scale should we consider when reconfiguring and reacting to an attack?”

Swarup: Currently we’re looking at attacks that flood a network in a day, and require a month to clean up [improvement is needed]. However, some attacks are multi-stage and take considerable time to execute [stuxnet]—these can be responded to on a human time scale.

Aldridge: It can take a lot of time to access all of the components in the network which need reconfiguring after an attack [some will be located in the ‘boonies’ of the network].

Bagchi: It can take seconds for a sensor to rest, while milliseconds are what’s needed.

“What are some specific attacks which require real-time responses?”

Aldridge: If you lose control of a key in the field, the system needs to eliminate the key easily and immediately.

Nita-Rotaru: When you are sending data on an overlay network, you need to be able to reroute automatically if a node becomes non-functional.

Narain: If you detect a sniffing attack, you can reroute or change the network-architecture to defend against it.

Ratazzi: Genetic algorithms can be used to identify problems at runtime and identify a working solution.

“What design principles might you add to the classic 8 to account for real-time responses/resiliency?”

Swarup & Nita-Rotaru: Assume all off-the-shelf mobile devices are compromised, focus on using them while protecting the rest of the system using partitioning and trust relationships, and by attempting to get trusted performance of small tasks over small periods of time in potentially compromised environment. Complete isolation [from/of compromised components] is probably impossible.

Ratazzi & Bagchi: minimize non-essential functionality of critical systems, focus on composing small systems to form larger ones, using segmentation-separate tools and accesses for separate functions-where possible to reduce impact of attack.

Panel #2: Scientific Foundations of Cyber Security (Panel Summary)


Tuesday, April 5, 2011

Panel Members:

  • Victor Raskin, Purdue
  • Greg Shannon, CERT
  • Edward B. Talbot, Sandia National Labs
  • Marcus K. Rogers, Purdue

Panel Summary by Pratik Savla

Edward Talbot initiated the discussion by presenting his viewpoint on Cyber security. He described himself as a seasoned practitioner in the field of cyber security. He highlighted his concerns for cyber security. The systems have become too complicated to provide an assurance of having no vulnerabilities. It is an asymmetrical problem. For an intruder, it may just take one door to penetrate the system but for the person managing the system, he/she would need to manage a large number of different doors. Any digital system can be hacked and any digital system that can be hacked will be hacked if there is sufficient value in that process. Talbot described problems in three variations: near-term, mid-term and long term. He used a fire-fighting analogy going back two centuries when on an average a U.S. city would be completely gutted and destroyed every five years. If the firefighters were asked about their immediate need, they would say more buckets are required. But, if they were asked what to do to prevent this from happening again, they had no answer. Talbot placed this concern into three time-frames: near-term, mid-term and long term. The first time frame involves the issue of what to do today to prevent this situation. The second timeframe tries to emphasize that it is important to be ahead of the game. The third timeframe involves the role of science. In this context, the development of a fire science program in academia. To summarize, he pointed out that the thinking that gets one into a problem is insufficient to get one out of the problem.

Talbot quoted a finding from the JASON report on the science of cyber security which stated that the highest priority should be assigned to the establishment of research protocols to enable reproducible experiments. Here, he stated that there is a science of cyber security. He concluded by comparing the scenario to being in the first step of a 12-step program (borrowing from Alcoholics Anonymous). It means to stop managing an unmanageable situation and instead developing a basis to rethink what one does.

Rogers focused on the the question: Do we have foundations that are scientifically based that can help answer some of the questions in form of research? Are we going in the right direction? This lead to a fundamental question: how we define a scientific foundation? What defines science? He highlighted some common axioms or principles such as body of knowledge, testable hypotheses, rigorous design and testing protocols and procedures, metrics and measurements, unbiased results and their interpretation, informed conclusions, repeatability as well as feedback into theory that are found across different disciplines. The problems that one comes across are non-existence of natural laws, man-made technologies in constant flux, different paradigms of research such as observational, experimental and philosophical, non-common language, extent of reliability and reproducibility of metrics, difference in approach such as applied versus basic, studying symptoms as opposed to causes. Cyber security is informed by a lot of disciplines such as physics, epidemiology, computer science, engineering, immunology, anthropology, economics and behavioral sciences.

The JASON report on the science of cyber security came out with strategies that are areas such as modeling and simulation which involved biological, decisional, inferential, medical as well as behavioral models that could be considered when viewing it on a scientific foundation. He emphasized that cyber security problems lend themselves to a scientific based approach. He stressed that there will be a scientific foundation for cyber security only if it is done correctly and only when one is conscious about what constituted a scientific foundation. Even solutions such as just-in-time, near-term and long-term can be based on a scientific foundation.

He pointed out that currently the biggest focus was on behavioral directive. In other words, how do we predict what will happen 20 years from now if employee ‘X’ is hired?

Shannon addressed the question: How do we apply the scientific method? Here, he presented the software engineering process. He discussed its various components by describing the different issues each one addresses. Firstly, what data do we have? What do we know? What can we rely on? What is something that we can stand on which is reasonably solid? Secondly, why do we have data that is prone to exploitation? He highlighted reasons such as lack of technology as well as mature technology, lack of education and lack of capacity. Here, he concluded that these hypotheses do not seem to stand the test of data as the data indicated we have always had problems. He then stated some alternative hypothesis such as market forces, people and networks that can be considered. He stressed on the point that solutions are needed based on what people and systems do, not what we wish they would do. The stumbling block for such a case is the orthodoxy of cyber security which means being in the illusion that by just telling people to do the right thing and using the right technology would lead to a solution to a problem. It is analogous to an alchemist who would state that just by telling the lead to turn gold, it would become gold. He stressed that we need to understand what is going on and what is really possible. The key message was that if there is a science that is built on data, it would involve much more than just theory.

Raskin took a more general view of cyber science by offering some of his thoughts on the subject. He said that he did not agree to the “American” definition of science which defines it as a small sub-list of disciplines where experiments can be run and immediate verification is possible as he considered it to be too narrow. He conformed to the notion of science wherein any academic discipline that is well-defined is a science. He presented a schematic of the theory-building process. It involved components such as phenomena which corresponded to a purview of the theory, theory, methodology and the description, which is a general philosophical term for results. The theory is connected to the methodology and a good theory would indicate why it can help guide the methodology. He asked why we were not questioning what we were doing. The first thought was related to the issue of data provenance i.e. why are you doing what are you doing? The second thought focused on the question of how we deal with different sciences that all part of cyber science. A mechanism that can help address that is that of rigorous application. He disagreed with the notion that combining two things without any import/export of sub-components leads to some worthy result. He stated that from the source field, components such as data, theory and methods should be imported to the target field. Only the problems of the source field should be excluded from being imported. The second thought emphasized about forming a linkage between the two fields; source and target by a common application. He concluded that without a theory, one does not know what one is doing and one does not know why one is doing it? It does not imply that there is no theory in existence. On the contrary, anything that is performed has an underlying theory and one may not be having any clue about that theory.

A question about complexity theory brought up an example of a bad scientific approach wherein the researcher adds more layer of complexity or keeps changing the research question but does not ever question the underlying theory which may be flawed.

Panel #1: Traitor Tracing and Data Provenance (Panel Summary)


Tuesday, April 5, 2011

Panel Members:

  • David W. Baker, MITRE
  • Chris Clifton, Purdue
  • Stephen Dill, Lockheed Martin
  • Julia Taylor, Purdue

Panel Summary by Nikhita Dulluri

In the first session of the CERIAS symposium, the theme of ‘Traitor Tracing and Data Provenance’ was discussed. The panelists spoke extensively about the various aspects relating to tracing the source of a given piece of data and the management of provenance data. The following offers a summary of the discussion in this panel.

With increasing amounts of data being shared among various organizations such as health care centers, academic institutions, financial organizations and government organizations, there is need to ensure the integrity of data so that the decisions based on this data are effective. Providing security to the data at hand does not suffice, it is also necessary to evaluate the source of the data for its trust-worthiness. Issues such as which protection method was used, how the data was protected, and whether it was vulnerable to any type of attack during transit might influence how the user uses the data. It is also necessary to keep track of different types of data, which may be spread across various domains. Identification of the context of the data usage i.e., why a user might want to access a particular piece of data or the intent of data access is also an important piece of information to be kept track of.

Finding the provenance of data is important to evaluate its trustworthiness; but this may in-turn cause a risk to privacy. In case of some systems, it may be important to hide the source of information in order to protect its privacy. Also, data or information transfer does not necessarily have to be on a file to file exchange basis- there is also a possibility that the data might have been paraphrased. Data which has a particular meaning in a given domain may mean something totally different in another domain. Data might also be given away by people unintentionally. The question now would be how to trace back to the original source of information. A possible solution suggested to this was to pay attention to the actual communication, move beyond the regions where we are comfortable and to put a human perspective on them, for that is how we communicate.

Scale is one of the major issues in designing systems for data provenance. This problem can be solved effectively for a single system, but the more one tries to scale it to a higher level, the less effective the system becomes. Also, deciding how much provenance is required is not an easy question to answer, as one cannot assume that one would know how much data the user would require. If the same amount of information as the previous transaction was provided, then one might end up providing excess (or insufficient) data than what is required.

In order to answer the question about how to set and regulate policies regarding the access of data, it is important to monitor rather than control the access to data. Policies when imposed at a higher level are good, if there is a reasonable expectation that people will act accordingly to the policy. It is important not to be completely open about what information will be tracked or monitored, as, if there is a determined attacker, this information would be useful for him to find a way around it.

The issue of data provenance and building systems to manage data provenance has importance in several different fields. In domains where conclusions are drawn based on a set of data and any alterations to the data would change the decisions made, data provenance is of critical importance. Domains such as the DoD, Health care institutions, finance, control systems and military are some examples.

To conclude, the problem of data provenance and building systems to manage data provenance is not specific to a domain or a type of data. If this problem can be solved effectively in one domain, then it can be extended and modified to provide the solution to other domains as well.