Convicting Exploitable Software Vulnerabilities: Practical Input Provenance-Based Approach

Research Areas: End System Security

Principal Investigator: Xiangyu Zhang

Vulnerabilities in software, especially those that are remote exploitable, are the root cause of wave after wave of security attacks, such as botnet, zero-day worms, non-control data corruptions, and even server-break-ins. Thus, analyzing and exposing software vulnerabilities has become one of the most active research areas today. In the past, software vulnerability detection/exposing approaches could be divided into two categories: dynamic and static. Static analysis creates a lot of false positives. Dynamic approaches monitor program execution and detect attempts of attacking a software system. These techniques incur non-trivial runtime overhead and cannot detect vulnerabilities that not under attack. Dynamic test generation has the potential of generating exploit inputs to confirm vulnerabilities. However, most existing dynamic test generation techniques suffer from the scalability problem.  In this project, we develop a practical dynamic approach that is intended to use in combination with other static tools. We observe that although the suspect pool produced by existing static tools has a high false positive rate, it is nonetheless much smaller than the whole population. Therefore, we use existing static tools as the frontend to generate a set of suspects. Our technique then tries to generate exploits for these suspects. A suspect is convicted only when an exploit can be acquired as the evidence. Such exploits significantly assist regular users and administrators to evaluate the robustness of their software and convince vendors to debug and patch. The key idea is to use data lineage tracing to identify a set of input values relevant to the execution of a vulnerable code location. Exploit specific mutations are applied to the relevant input values in order to trigger an attack, e.g., for example, changing an integer value to MAXUINT to induce an integer overflow. Since these inputs are usually a very small subset of the whole input sequence, mutating the whole input, like in random test generation, is avoided. Our technique does not rely on symbolic execution and constraint solving and thus can easily handle long execution. In case an execution that covers a vulnerable code location cannot be found, our technique also allows user interactions to mutate an input so that the execution driven by the mutated input covers the vulnerable code location. Our technique addresses a wide range of vulnerabilities including buffer overflow, integer overflow, format string, etc. Our dynamic analysis works at binary level, which greatly facilitates users that do not have the source code access but are concerned about software vulnerabilities. We have developed a data lineage tracing prototype. It traces the set of input that is relevant to a particular execution point. The lineage information is used to guide our evidence generation procedure. The challenge of efficiency is overcome by using Reduced Ordered Binary Decision Diagrams (RoBDDs). Our initial experience with a set of known and unknown real vulnerabilities showed that our technique can very quickly generate exploit inputs.


Other PIs: Dongyan Xu

Representative Publications

  • Z. Lin, X. Zhang, and D. Xu, Convicting Remote Exploitable Vulnerabilities: An Efficient Input Provenance Based Approach, Proceedings of IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2008.

Keywords: dynamic test generation, software vulnerability