Reversing engineering of data structures involves two aspects: (1) given an application binary, infers the data structure definitions; and (2) given a memory dump, infers the data structure instances. These two capabilities have a number of security and forensics applications that include vulnerability discovery, kernel rootkit detection, and memory forensics.
In this dissertation, we present an integrated framework for reverse engineering of data structures from binary. There are three key components in our framework: REWARDS, SigGraph and DIMSUM. REWARDS is a data structure definition reverse engineering component that can automatically uncover both the syntax and semantics of data structures. SigGraph and DIMSUM are two data structure instance reverse engineering components that can recognize the data structure instances in a memory dump. In particular, SigGraph can systematically generate non-isomorphic signatures for data structures in an OS kernel and enable the brute force scanning of kernel memory to find the data structure instances. SigGraph relies on memory mapping information, but DIMSUM, which leverages probabilistic inference techniques, can directly scan memory without memory mapping information.
We have developed a number of enabling techniques in our framework that include (1) bi-directional (i.e., backward and forward) data flow analysis, (2) signature graph generation and comparison, and (3) belief propagation based probabilistic inference. We demonstrate how we integrate these techniques into our reverse engineering framework in this dissertation.
We have obtained the following preliminary experimental results. REWARDS achieved over 80% accuracy in revealing data structure definitions accessed during an execution. SigGraph recognized Linux kernel data structure instances with zero false negative and close-to-zero false positives, and had strong robustness in the presence of malicious pointer manipulations. DIMSUM achieved over 20% accuracy improvement than previous nonprobabilistic approaches without memory mapping information.
Attribute based systems enable fine-grained access control among a group of users each identified by a set of attributes. Secure collaborative applications need such flexible attribute based systems for managing and distributing group keys. However, current group key management schemes are not well designed to manage group keys based on the attributes of the group members. In this paper, we propose novel key management schemes that allow users whose attributes satisfy a certain access control policy to derive the group key. Our schemes efficiently support rekeying operations when the group changes due to joins or leaves of group members. During a rekey operation, the private information issued to existing members remains unaffected and only the public information is updated to change the group key. Our schemes are expressive; are able to support any monotonic access control policy over a set of attributes. Our schemes are resistant to collusion attacks; group members are unable to pool their attributes and derive the group key which they cannot derive individually.
For a long time, number theory has influenced information security and cryptography. This thesis adds examples of its influence.
The first topic is related with Broadcast Group Key Management (BGKM) in cryptography. After the Access Control Polynomial (ACP) BGKM scheme was proposed, people tried to check its basic security properties in BGKM. They found that it has a weakness in the key hiding property by finding a counterexample when $p=2$. Here, we give strong evidence that it has a weakness in its key hiding property for all sufficiently large primes.
The second topic is a well known integer factoring algorithm SQUFOF, which stands for SQUare FOrm Factorization, invented by Daniel Shanks. At present, SQUFOF is the fastest factoring algorithm for numbers between $10^$ and $10^$. In Gower’s thesis, he made conjectures about the probability distribution of the number of forms that SQUFOF must examine before finding a proper square form and the number of forms enqueued during the factorization of $N$. We propose a different probability distribution (geometric rather than exponential) than did Gower, and we use Gower’s data to support our conclusions.
The third topic is the period of the Bell numbers $B(n)$ modulo a prime. It was proved by Williams that the minimum period of the sequence $\{B(n) \bmod ~p\}$, $n=0$, 1, 2, $\ldots$, divides $N_p=(p^p-1)/(p-1)$. In fact, the minimum period equals $N_p$ for every prime $p$ for which this period is known. Several people have conjectured that the minimum period is always $N_p$. This thesis presents a heuristic argument supporting the conjecture.
The tracking and monitoring of fugitives and persons of interest is of significant concern for the Indiana Department of Corrections (IDOC) Fugitive Detection Unit. The research conducted was to help determine the benefits of implementing a face recognition technology solution. Images were analyzed for standard compliance to help determine their suitability for input into a face recognition matcher. Results from this analysis showed the images were not in compliance with the NIST Mug Shot Best Practices, nor could the software optimize the images to make them compliant. A visit to the intake facility indicated that the process by which these mug shots were collected needed to be addressed before face recognition technology could be implemented. Consequently, the IDOC main prisoner intake facility’s current mug shot image capture process was assessed. Using the analysis from the images, along with observations from the mug shot capture process, an optimized capture process was implemented for a trial period of two weeks to determine its effectiveness. Results show that the capture process improved the standard compliance of the mug shot images, determining that the images collected would be usable with face recognition technology. Another finding was that the centerline location ratio variable, which has a precise threshold, was not compliant for any images in either dataset leading to the need for further study to determine if this variable should utilize a range of values for an operational environment such as at the IDOC.
IT enabled products are the result of a fusion of IT with the core functionalities of any product or device around us. This fusion is leading to numerous benefits and advantages that are just beginning to appear. However, with the increasing number and sophistication of vulnerabilities and threats in IT, the IT enabled products have also come in the line of fire. Due to the critical and diverse nature of these products, it is important that a holistic security framework exists that addresses security in the early phases of product development. The current state of security in IT enabled products strongly suggests this need along with the efforts of industry leaders in respective fields. In this thesis, the author has made an effort to address security in the IT enabled products by proposing a new framework based on the Balanced Scorecard. The proposed framework uses the concept of the four views and other characteristics of the Balanced Scorecard and it has a strong focus on security. The proposed framework has been evaluated by Prof. James E. Goldman; the chair of this thesis committee and its application has also been demonstrated to one of the discussed case examples of security failures. From this research, it has been concluded that the proposed framework can indeed effectively address security in the IT enabled products.
Mobile phones are increasingly a source of evidence in criminal investigations. The evidence on a phone is volatile and can easily be overwritten or deleted. There are many tools that claim to radio isolate a phone in order to preserve evidence. Unfortunately the wireless preservation devices do not always successfully prevent network communication as promised. The purpose of this study was to identify situations where the devices used to protect evidence on mobile phones can fail. There has been little published research on how well these devices work in the field despite the escalating importance of mobile phone forensics. These shielding devices were tested using mobile phones from three of the largest services providers in the U.S. Calls were made to contact the isolated phones using voice, SMS, and MMS at varying distances from the provider’s towers. In the majority of the test cases the phones were not isolated from their networks despite being enclosed in a shielding device. It was found that SMS calls penetrated the shields the most often. Voice calls were the next most likely to penetrate the shields and MMS were the least.
Recent advances in sensor technologies have made sensors more economical, power efficient, and portable to be mounted onto handheld devices for monitoring different environmental factors, and made mobile sensor networks possible. When it is financially infeasible to deploy enough or an excessive number of sensors, while the environmental factors they monitor are critical for public health and safety, such as chemical or radiation monitoring, we deploy mobile sensors that move under our control. We also decide the best mobility strategy to achieve the desired goals. We propose and analyze mobility strategies that give a well-balanced performance for various goals which may be antagonistic. We notice that for a stochastic mobility algorithm, pausing at a location is well-justified to achieve better quality in event monitoring and a closer match with the expected monitoring time of a location by the sensor. We also notice that the quality of event monitoring at a location may not be proportional to the time the sensors spend at the location. In other cases when it is economical to deploy an excessive number of sensors to monitor the environment by attaching them to electronic devices owned by the public, traces of mobile nodes are collected to help design and analyze of such systems and evaluate the expected performance before deployment. We are interested in studying privacy leakage through trace publication. Although published traces have their identity being replaced consistently with random IDs, movements of mobile nodes can be openly observed by others, or they may be learned through web blogs, status in social networks, and causal conversations, etc. It is then possible for an attacker to learn the whole movement history of the participants, breaching the privacy protection. We study comprehensively attack strategies both analytically and experimentally using real and synthetic traces. We observe that with high probability an adversary can identify participants in the trace set with the current scale of trace collection and publication.
To secure today’s computer systems, it is critical to have different intrusion detection sensors embedded in them. The complexity of distributed computer systems makes it difficult to determine the appropriate choice and placement of these detectors because there are many possible sensors that can be chosen and each sensor can be placed in several possible places in the distributed system. In this paper, we describe a method to evaluate the effect a detector configuration has on the accuracy and precision of determining the system’s security goals. The method is based on a Bayesian network model, obtained from an attack graph representation of multi-stage attacks. We use Bayesian inference to solve the problem of determining the likelihood that an attack goal has been achieved, given a certain set of detector alerts. We quantify the overall performance as a function of the placement, quality, and uncertainty in knowledge of the detectors. Based on these observations, we implement a dynamic programming algorithm for determining the optimal detector settings in a large-scale distributed system and compare it against a previous Greedy algorithm. Finally, we report the results of five experiments, measuring the Bayesian networks behavior in the context of two real-world distributed systems undergoing attacks. Results show the dynamic programming algorithm outperforms the Greedy algorithm in terms of the benefit provided by the set of detectors picked. The dynamic programming solution also has the desirable property that we can trade off the running time with how close the solution is to the optimal.
Third party data distribution frameworks such as the cloud are increasingly being employed in order to store, process, and publish sensitive information such as healthcare and finance information, belonging to individuals and enterprises. Such data objects are often organized as trees, graphs or even forests (e.g., XML). In third party frameworks, not only authentication of data is important but also protection of privacy and assurance of confidentiality are important. Moreover, data authenticity must be assured even when the data object that a user has access to consists of subset(s) of the signed data.
Existing solutions such as Merkle hash technique and the redactable signature schemes lead to leakages of structural information, which can be used to infer sensitive information, which in turn would lead to privacy and confidentiality breaches. So the question is: can we authenticate subset(s) of signed data objects without leaking, and if so, how efficiently such authentication can be carried out? We have reported a positive result by presenting efficient and provably secure solutions not only for trees, but also graphs and forests. We have presented a scheme that computes only one signature per tree, graph or forest.
Our schemes support encrypted data to be stored at third-party services. Our schemes can also be used to automatically recover from structural errors in
tree-structured data, and for leakage-free authentication of paths (e.g.,
XPaths). Further, as the applications of our schemes, we have also developed a
publish/subscribe model for XML—Structure-based routing, and a scheme for authentication of objects.
Privacy-preserving microdata publishing currently lacks a solid theoretical foundation. Most existing techniques are developed to satisfy syntactic privacy notions such as $k$-anonymity, which fails to provide strong privacy guarantees. The recently proposed notion of differential privacy has been widely accepted as a sound privacy foundation for statistical query answering. However, no general practical microdata publishing techniques are known to satisfy differential privacy. In this paper, we start to bridge this gap. We first analyze k-anonymization methods and show how they fail to provide sufficient protection against re-identification, which it was designed to protect. We then prove that, $k$-anonymization methods, when done “safely”, and when preceded with a random sampling step, can satisfy $(\epsilon,\delta)$-differential privacy with reasonable parameters. This result is, to our knowledge, the first to link $k$-anonymity with differential privacy and illustrates that “hiding in a crowd of $k$” indeed offers privacy guarantees. This naturally leads to future research in designing “safe” and practical $k$-anonymization methods. We observe that our result gives an alternative approach to output perturbation for satisfying differential privacy: namely, adding a random sampling step in the beginning and pruning results that are too sensitive to changing a single tuple. This approach may be applicable to settings other than microdata publishing. We also show that adding a random-sampling step can greatly amplify the level of privacy guarantee provided by a differentially-private algorithm. This result makes it much easier to provide strong privacy guarantees when one wishes to publish a portion of the raw data. Finally, we show that current definitions of $(\epsilon, \delta)$-differential privacy require $\delta$ to be very small to provide sufficient privacy protection when publishing microdata, making the notion impractical in some scenarios. To address this problem, we introduce a notion called $f$-smooth $(\epsilon, \delta)$-differential privacy.
Trusted collaborative computing (TCC) is a new research and application paradigm. Two important challenges in such a context are represented by secure information transmission among the collaborating parties and selective differentiated access to data among members of collaborating groups. Addressing such challenges requires, among other things, developing techniques for secure group communication (SGQ), secure dynamic conferencing (SDC), differential access control (DIF-AC), and hierarchical access control (HAC). Cryptography and key management have been intensively investigated and widely applied in order to secure information. However, there is a lack of key management mechanisms which are general and flexible enough to address all requirements arising from information transmission and data access. This paper proposes the first holistic group key management scheme which can directly support all these functions yet retain efficiency. The proposed scheme is based on the innovative concept of access control polynomial (ACP) that can efficiently and effectively support full dynamics, flexible access control with fine-tuned granularity, and anonymity. The new scheme is immune from various attacks from both external and internal malicious parties.
Medical Information Systems (MIS) help medical practice and health care significantly. Security and dependability are two increasingly important factors for MIS nowadays. In one hand, people would be willing to step into the MIS age only when their privacy and integrity can be protected and guaranteed with MIS systems. On the other hand, only secure and reliable MIS systems would provide safe and solid medical and health care service to people. In this paper, we discuss some new security and reliability technologies which are necessary for and can be integrated with existing MISs and make the systems highly secure and dependable. We also present an implemented Middleware architecture which has been integrated with the existing VISTA/CPRS system in the U.S. Department of Veterans Affairs seamlessly and transparently.
Key management is a core mechanism to ensure the security of applications and network services in wireless sensor networks. Key management includes two aspects: key distribution and key revocation. The goal of the key distribution is to establish the required keys between sensor nodes which must exchange data. Key revocation is used to remove compromised sensor nodes from the network. Although many key distribution schemes and key revocation schemes have been proposed in the literature, there is a lack of a framework which can integrate the schemes. In this paper, we propose a key management framework, uKeying, for wireless sensor networks using a globally distributed session key. uKeying includes three parts: a security mechanism to provide secrecy for the communication in the sensor network, an efficient session key distribution scheme, and a centralized key revocation scheme. The proposed framework does not depend on a specific key distribution scheme and can support many key distribution schemes. We further demonstrate how to use the framework to support secure group communication protocols in wireless sensor networks. Our analysis shows that the framework is secure, efficient, and extensible. The simulation and results reveal for the first time that a centralized key revocation scheme can also attain a high efficiency.
An attack graph is an abstraction that represents the ways an attacker can violate a security policy by leveraging interdependencies among discovered vulnerabilities. Attack graph analyses that extract security-relevant information from the attack graph are referred to as attack graph-based security metrics. Although a number of attack graph-based security metrics have been proposed in the literature, there has been no analysis of how these security metrics behave in response to security incidents. In this dissertation, we examine how attack graph-based security metrics behave in response to increased network vulnerabilities under heterogeneous network models. From this analysis, we identify opportunities for using equations that characterize particular attack graph-based security metrics avoiding the costly processing of attack graphs.
Security is recognized to be a multidimensional entity. However, all proposed attack graph-based security metrics have been unidimensional. In this dissertation, we provide an approach for aggregating the capabilities of existing attack graph-based security metrics with our proposed suite of attack graph-based security metrics.
Lastly, we specify an algorithm for network hardening given a limited budget. Given a set of network vulnerabilities and a set of candidate countermeasures to implement, a network administrator is to choose the set of countermeasures that optimize security given a limited budget. Our algorithm produces sets of countermeasures that optimize security with respect to a set of attack graph-based security metrics while staying within budget.
The ability to quickly detect and locate stealthy radiological dispersal devices (RDDs) allows authorities to disarm and remove the RDDs before they can be detonated. Traditionally, the detection of RDDs was accomplished by using expensive and cumbersome radiation sensors strategically located in the surveillance area. However, with recent advancements in wireless technologies and sensing hardware, deploying a large scale sensor network with small sensors is now becoming a reality. In this dissertation, we study methods to detect and locate radiation sources quickly and accurately using a network of sensors.
Localization of a single radiation source can be achieved by using three sensors in a noise- and error-free environment. When both noise and errors are considered, we present a closed-form solution that outperforms existing algorithms. When more than three sensors are available, we present an efficient algorithm to exploit the additional sensor data, in order to further improve the robustness and accuracy of the localization.
To localize multiple sources in a sensor network, we propose a hybrid formulation of a particle filter with a mean-shift technique, in order to achieve several important features which address major challenges faced by existing multiple source localization algorithms. First, our algorithm is able to maintain a constant number of estimation (source) parameters even as the number of radiation sources K increases. Second, our algorithm “learns” the number of sources from the estimated source parameters instead of relying on expensive statistical estimations. Third, the presence of obstacles may improve the localization accuracy of our algorithm. Unfortunately, the presence of obstacles significantly degrades the accuracy of existing algorithms.
When no radiation source is present, the localization algorithms produce false positives as the algorithms assume that a radiation source is present. We propose the Localization Enhanced Detection (LED) method, that decides whether a source with the estimated parameters is present or absent, using a close-to-minimal number of measurements, while maintaining the false positive and false negative rates below a specified level. We evaluate the LED method using simulation and testbed experiments, and compare the effectiveness of the LED method with existing detection methods.
We build a cross-platform, cross-language, and versatile software framework that provides an abstraction for interfacing with sensors and supports building applications on radiation source localization. The software framework implements various localization algorithms that are ready to be deployed in an actual system. The components in the software framework are loosely coupled and are general enough to support application domains beyond radiation source localization. We demonstrate the versatility of the software framework in building the Rapid Structural Assessment Network.