Improving Protocol Vulnerability Discovery via Semantic Interpretation of Textual Specifications
Principal Investigator: Dan Goldwasser
Two methods used for vulnerability discovery in network protocols are protocol testing and protocol property model checking. Testing and model checking implementations of network protocols is a tedious and time-consuming task, where significant manual effort goes into designing test cases and testing scenarios, in the case of testing, or protocol requirements in the case of model checking.
Both approaches require detailed and structured information about the tested protocols, in the form of messages, state machine, invariants, etc. Most of the time this information is derived manually by people with different levels of expertise,. The process can be made more effective and less expensive by leveraging documentation and specification about these protocols and available in text format. Automatically analyzing the information available in documentations in the form of textual specification will open new avenues not only for improving vulnerability finding for network protocols, but for software design in general.
In this project we combine expertise from natural language processing and network security to create and build a framework for vulnerability discovery in network protocols, by leveraging semantic interpretation of textual specification, automated attack generation and injection, and property model checking for software implementations. The framework consists of two phases, a knowledge building phase and a vulnerability finding phase. In the knowledge building phase, we apply semantic interpretation NLP techniques to structured text (RFCs and documentation) and unstructured text (blogs, forums, and bug reports) to learn structured information about protocols such as: message formats, protocol state machine, constraints, etc. In the second phase we apply this information to two mechanisms for vulnerability finding, the first uses the structured protocol information to create and inject attacks, and the second uses the same information to derive protocol requirements and use them to model check finite state machines extracted from protocol implementations.
Other PIs: Cristina Nita-Rotaru
Keywords: Knowledge extraction, Natural Language processing, Protocol vulnerability discovery