Panel #3: Fighting Through: Mission Continuity Under Attack (Panel Summary)
Tuesday, April 5, 2011
- Paul Ratazzi, Air Force Research Laboratories
- Saurabh Bagchi, Purdue
- Hal Aldridge, Sypris Electronics
- Sanjai Narain, Telcordia
- Cristina Nita-Rotaru, Purdue
- Vipin Swarup, MITRE
Panel Summary by Christine Task
In Panel #3: “Fighting Through: Mission Continuity Under Attack”, each of the six panelists began by describing their own perspective on the problem of organizing real-time responses and maintaining mission continuity during an attack. They then addressed three questions from the audience.
Paul Ratazzi offered his unique insight as the technical advisor for the Cyber Defense and Cyber Science Branches at the Air Force Research Laboratory in Rome, NY. He noted that military organizations are necessarily already experienced at “guaranteeing mission essential functions in contested environments” and suggested that the cyber-security world could learn from their general approach. He divided this approach into four stages: Avoid threats (including hardening systems, working on information assurance, and minimizing vulnerabilities in critical systems), survive attacks (develop new, adaptive, real-time responses to active attacks), understand attacks (forensics), and recover from attacks (build immunity against similar future attacks). Necessary developments to meet these guidelines are improved understanding of requirements for critical functions (systems engineering) and real-time responses that go beyond our current monitor/detect/respond pattern. As a motivation for the latter, he gave the example of a fifth generation fighter, nicknamed a ‘flying network’. When its technological systems are under attack, looking through the log file afterwards is “too little, too late”.
Dr. Saurabh Bagchi of CERIAS and the Purdue School of Electrical and Computer Engineering described an innovative NSF-funded research project which offered real-time responses to attacks on large-scale, heterogeneous distributed systems. These systems involve a diverse array of third-party software and often offer a wide variety of vulnerabilities to an attacker. Additionally, attacks across these systems can spread incredibly quickly using trust relationships and privilege escalation, eventually compromising important internal resources. Any practical reaction must occur in machine-time. Dr. Bagchi’s research chose the following strategies: Use bayesian-inference to guess which components are currently compromised at a given time, and from that information estimate which are most likely to be attacked next. Focus monitoring efforts on those components precieved as at risk. Use knowledge of the distributed system to estimate the severity of the attack in progress, and respond appropriately with real-time containment steps such as randomizing configurations or restricting access to resources. Finally, he emphasized the importance of learning from each attack. Long-term responses should abstract the main characteristics of the attack and prepare defenses suited to any similar attacks in the future.
Dr. Sanjai Narain, a Senior Research Scientist in Information Assurance and Security at Telcordia Research, described his own work on distributed systems defense—a novel, concrete solution for the type of immediate containment suggested by Dr. Bagchi. Although the high-level abstraction of a network as a graph is relatively straightforward, the actual configuration space can be incredibly complex with very many variables to set at each node. ConfigAssure is an application which eliminates configuration errors by using SAT constraint solvers to find configurations which satisfy network specifications. For any given specification, there are likely many correct configurations. In order to successfully attack a network, an attacker must gain some knowledge of its layout (such as the location of gateway routers). By randomizing the network configuration between different correct solutions to the specification, an attacker can be prevented from learning anything useful about the network while the users themselves remain unaware of any changes.
Dr. Cristina Nita-Rotaru, an Assistant Director of CERIAS and an Associate Professor in the Department of Computer Science at Purdue, introduced an additional concern with maintaining mission continuity: maintaining continuity of communication. She offered the recent personal example of having her credit cards compromised while traveling. She was very quickly informed of this problem by her credit card companies and was thus able to make a risk-assessment of the situation and form a reasonable response (disabling one card while continuing to use the less vulnerable one until she could return home). When an attack compromises channels of communication, for example by taking out the network which would be used to communicate—as in jamming wireless networks, the information necessary to make a risk-assessment and form containment strategies is not available. Thus when considering real-time reactions to attacks, it’s important to make sure the communication network is redundant and resilient.
Dr. Hal Aldridge, the Director of Engineering at Sypris Electronics and a previous developer of unmanned systems for space and security applications at Northrop Grumman and NASA, discussed the utility of improving key-management systems to respond to real-time attacks. Key management systems which are agile and dynamic can help large organizations react immediately to threats. In a classic system with one or few secrets which are statically set, the loss of a key can be catastrophic. However, a much more robust solution is a centralized cryptographic key management system which uses a large, accurate model of the system to enable quickly changing potentially compromised keys, or using key changes to isolate potentially compromised resources. He briefly described his work on such a system.
Dr. Vipin Swarup, Chief Scientist for Mission Assurance Research in MITRE’s Information Security Division, emphasized one final very important point about real-time system defense: high-end threats are likely to exist inside the perimeter of the system. Our ability to prevent predictable low-end threats from entering the perimeter of our systems is reasonably good. However, we must also be able to defend against strategic, targeted, adaptive attacks which are able to launch from inside our security system. In this case, as the panel has discussed, the key problem is resiliency; we must be able to launch our real-time response from within a compromised network. Dr. Swarup summarized three main guidelines for approaching this problem: reduce threats (by deterring and disrupting attackers), reduce vulnerabilities (as Ratazzi described, understand system needs and protect critical resources), and reduce consequences (have a reliable response). Any real-time response strategy must take into account that the attacker will also be monitoring and responding to the defender, must be able to build working functionality on top of untrusted components, and must have a more agile response-set than simply removing compromised components.
After these introductions, there was time to address three questions to the panel [responses paraphrased].
“What time-scale should we consider when reconfiguring and reacting to an attack?”
Swarup: Currently we’re looking at attacks that flood a network in a day, and require a month to clean up [improvement is needed]. However, some attacks are multi-stage and take considerable time to execute [stuxnet]—these can be responded to on a human time scale.
Aldridge: It can take a lot of time to access all of the components in the network which need reconfiguring after an attack [some will be located in the ‘boonies’ of the network].
Bagchi: It can take seconds for a sensor to rest, while milliseconds are what’s needed.
“What are some specific attacks which require real-time responses?”
Aldridge: If you lose control of a key in the field, the system needs to eliminate the key easily and immediately.
Nita-Rotaru: When you are sending data on an overlay network, you need to be able to reroute automatically if a node becomes non-functional.
Narain: If you detect a sniffing attack, you can reroute or change the network-architecture to defend against it.
Ratazzi: Genetic algorithms can be used to identify problems at runtime and identify a working solution.
“What design principles might you add to the classic 8 to account for real-time responses/resiliency?”
Swarup & Nita-Rotaru: Assume all off-the-shelf mobile devices are compromised, focus on using them while protecting the rest of the system using partitioning and trust relationships, and by attempting to get trusted performance of small tasks over small periods of time in potentially compromised environment. Complete isolation [from/of compromised components] is probably impossible.
Ratazzi & Bagchi: minimize non-essential functionality of critical systems, focus on composing small systems to form larger ones, using segmentation-separate tools and accesses for separate functions-where possible to reduce impact of attack.