Considerable attention has been given to the vulnerability of machine learning to adversarial samples. This is particularly critical in anomaly detection; uses such as detecting fraud, intrusion, and malware must assume a malicious adversary. We specically address poisoning attacks, where the adversary injects carefully crafted benign samples into the data, leading to concept drift that causes the anomaly detection to misclassify the actual attack as benign. Our goal is to estimate the vulnerability of an anomaly detection method to an unknown attack, in particular the expected minimum number of poison samples the adversary would need to succeed. Such an estimate is a necessary step in risk analysis: do we expect the anomaly detection to be suciently robust to be useful in the face of attacks? We analyze DBSCAN, LOF, one-class SVM as an anomaly detection method, and derive estimates for robustness to poisoning attacks. The analytical estimates are validated against the number of poison samples needed for the actual anomalies in standard anomaly detection test datasets. We then develop defense mechanism, based on the concept drift caused by the poisonous points, to identify that an attack is underway. We show that while it is possible to detect the attacks, it leads to a degradation in the performance of the anomaly detection method. Finally, we investigate whether the generated adversarial samples for one anomaly detection method transfer to another anomaly detection method.
More than ever, information system designers must provide security protection against a wide variety of threats. While numerous sources of guidance are available to inform the design process, system architects often improvise their own design methods. This paper aims to distil the experience gained by NSA trusted system analysts over decades so that it that can be practically applied by others. The general approach is to identify and reduce the number of assumptions on which the security of the system depends. Simply making these assumptions explicit and showing their interdependence has significant, albeit difficult to quantify, benefits for system security. Our hope is that this design methodology will serve as the starting point for the development of a more formal and robust engineering methodology for trusted system design.
This dissertation introduces a scorecard to enable the State of Indiana to measure the cybersecurity of its public and private critical infrastructure and key resource sector organizations. The scorecard was designed to be non-threatening and understandable so that even small organizations without cybersecurity expertise can voluntarily self-asses their cybersecurity strength and weaknesses. The scorecard was also intended to enable organizations to learn, so that they may identify and self-correct their cybersecurity vulnerabilities. The scorecard provided quantifiable feedback to enable organizations to benchmark their initial status and measure their future progress.
Using the scorecard, the Indiana Executive Council for Cybersecurity launched a Pilot to measure cybersecurity of large, medium, and small organizations across eleven critical infrastructure and key resources sectors. This dissertation presents the analysis and results from scorecard data provided by the Pilot group of 56 organizations. The cybersecurity scorecard developed as part of this dissertation has been included in the Indiana Cybersecurity Strategy Plan published September 21, 2018.
User’s digital identity information has privacy and security requirements. Privacy requirements include confidentiality of the identity information itself, anonymity of those who verify and consume a user’s identity information and unlinkability of online transactions which involve a user’s identity. Security requirements include correctness, ownership assurance and prevention of counterfeits of a user’s identity information. Such privacy and security requirements, although conflict in nature, are critical for identity management systems enabling the exchange of users’ identity information between different parties during the execution of online transactions. Addressing all such requirements, without a centralized party managing the identity exchange transactions, raises several challenges. This paper presents a decentralized protocol for privacy preserving exchange of users’ identity information addressing such challenges. The proposed protocol leverages advances in blockchain and zero knowledge proof technologies, as the main building blocks. We provide prototype implementations of the main building blocks of the protocol and assess its performance and security.
Renewable energy resources challenge traditional energy system operations by substituting the stability and predictability of fossil fuel based generation with the unreliability and uncertainty of wind and solar power. Rising demand for green energy drives grid operators to integrate sensors, smart meters, and distributed control to compensate for this uncertainty and improve the operational efficiency of the grid. Real-time negotiations enable producers and consumers to adjust power loads during shortage periods, such as an unexpected outage or weather event, and to adapt to time-varying energy needs. While such systems improve grid performance, practical implementation challenges can derail the operation of these distributed cyber-physical systems. Network disruptions introduce instability into control feedback systems, and strategic adversaries can manipulate power markets for financial gain. This dissertation analyzes the impact of these outages and adversaries on cyber-physical systems and provides methods for improving resilience, with an emphasis on distributed energy systems. ^ First, a financial model of an interdependent energy market lays the groundwork for profit-oriented attacks and defenses, and a game theoretic strategy optimizes attack plans and defensive investments in energy systems with multiple independent actors. Then attacks and defenses are translated from a theoretical context to a real-time energy market via denial of service (DoS) outages and moving target defenses. Analysis on two market mechanisms shows how adversaries can disrupt market operation, destabilize negotiations, and extract profits by attacking network links and disrupting communication. Finally, a low-cost DoS defense technique demonstrates a method that energy systems may use to defend against attacks.
Unauthorized data destruction results in a loss of digital information and services, a devastating issue for society and commerce that rely on the availability and integrity of such systems. Remote adversaries who seek to destroy or alter digital information persistently study the protection mechanisms and craft attacks that circumvent defense mechanisms such as data back-up or recovery. This dissertation evaluates the use of deception to enhance the preservation of data under the threat of unauthorized data destruction attacks. The motivation for the proposed solution is two-fold. (i) An honest and consistent view of the preservation mechanisms are observable and often controlled from within the system under protection, allowing the adversary to identify an appropriate attack for the given system. (ii) The adversary relies on some underlying I/O system to facilitate destruction and assumes that the components operate according to a confirmation bias based on prior interactions with similar systems. A deceptive memory system, DecMS, masks the presence of data preservation and mimics a system according to the adversary’s confirmation bias. Two proofs of concepts and several destructive threat instances evaluate the feasibility of a DecMS. The first proof of concept, DecMS-Kernel, uses rootkits’ stealth mechanisms to mask the presence of DecMS and impede potential destructive writes to enable preservation of data before destruction. The experimental results show that DecMS is effective against two common secure delete tools and an application that mimics crypto ransomware methods.
Mobile app poses both traditional and new potential threats to system security and user privacy. There are malicious apps that may do harm to the system, and there are mis-behaviors of apps, which are reasonable and legal when not abused, yet may lead to real threats otherwise. Moreover, due to the nature of mobile apps, a running app in mobile devices may be only part of the software, and the server side behavior is usually not covered by analysis. Therefore, direct analysis on the app itself may be incomplete and additional sources of information are needed. In this dissertation, we discuss how we can apply machine learning techniques in multiple tasks for security issues in regard of mobile apps in the Android platform. These include malicious apps detection and security risk estimation of apps. Both direct sources of information from the developer of apps and indirect sources of information from user comments are utilized in these tasks. We also propose comparison of these different sources in the task of security risk estimation to point out the necessity of usage of indirect sources in mobile app security tasks.
In the information age, vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to design mechanisms for effectively extracting knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of differentially private data publishing. Firstly, we propose PrivPfC, a differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done using the exponential mechanism with a novel quality function, which maximizes the expected number of correctly classified records by a histogram classifier. PrivPfC supports both the binary classification as well as the multiclass classification. Secondly, we study the problem of differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of the DPLloyd algorithm which is a differentially private version of the Lloyd algorithm and propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. We also propose a hybrid approach that combines the advantages of the improved version of DPLloyd and EUGkM. Finally, we investigate the sparse vector technique (SVT) which is a fundamental technique for satisfying differential privacy in answering a sequence of queries. We propose a new version of SVT that provides better utility by introducing an effective technique to improve the performance of SVT in the interactive setting. We also show that in the non-interactive setting (but not the interactive setting), usage of SVT can be replaced by the exponential mechanism.
Systems software written in C/C++ is plagued by bugs, which attackers exploit to gain control of systems, leak sensitive data, or perform denial-of-service attacks. This plethora of vulnerabilities is caused by C/C++ not enforcing memory or type safety in language by design, instead they leave security checks to the programmer. ^ Previous research primarily focuses on preventing control-flow hijack attacks. In a control-flow hijack attack, the attacker manipulates a return address or function pointer to cause code of her choosing to be executed. Abadi et al. propose Control- Flow Integrity (CFI), to prevent such attacks, but as our CFI survey shows, CFI mechanisms have varying degrees of precision. Researchers exploit the imprecision in CFI implementations to evade their protection. One area of imprecision in CFI mechanisms is virtual functions in C++ programs. Attackers can re-target virtual function calls to other invalid functions as part of an exploit. Our work, VTrust, provides specialized protection for C++ virtual functions with low overhead. ^ As CFI mechanisms improve, and are widely deployed, attackers will follow the path of least resistance towards other attack vectors, e.g., non-control-data attacks. In a non-control-data attack the attacker manipulates ordinary variables (not return addresses, function pointers, etc.) to carry out the attack. Non-control-data attacks are not prevented by CFI, because the control-flow follows a valid path in the original program. The attack is carried out by modifying only non-control-data. To address this emerging problem, we have developed Data Confidentiality and Integrity (DCI) which allows the programmer to select which data types should be protected from corruption and information leakage by the attacker. ^ In this dissertation, we propose that by using static analysis and runtime checks, we can prevent attacks targeted at sensitive data with low overhead. We have evaluated our techniques, VTrust and DCI, on the SPEC CPU2006 benchmarks, the Firefox web browser, and the mbedTLS cryptographic library. Our results show our implementations have lower performance overhead than other state-of-the-art mechanisms. In our security evaluation, we have several case studies which show our defenses mitigate publicly disclosed vulnerabilities in widely deployed software. In future work, we plan to improve our static sensitivity analysis for DCI and investigate new methods for automatically identifying sensitive data.
We introduce a privacy preserving biometrics-based authentication solution by which users can authenticate to different service providers from mobile phones without involving identity providers in the transactions. Authentication is performed via zero-knowledge proof of knowledge, based on a cryptographic identity token that encodes the biometric identifier of the user and a secret provided by the user, making it three-factor authentication. Our approach for generating a unique, repeatable and revocable biometric identifier from the user’s biometric image is based on a machine learning based classification technique which involves the features extracted from the user’s biometric image. We have implemented a prototype of the proposed authentication solution and evaluated our solution with respect to its performance, security and privacy. The evaluation has been performed on a public dataset of face images.
Multi-layer distributed systems, such as those found in corporate systems, are often the target of multi-stage attacks. Such attacks utilize multiple victim machines, in a series, to compromise a target asset deep inside the corporate network. Under such attacks, it is difficult to identify the upstream attacker’s identity from a downstream victim machine because of the mixing of multiple network flows. This is known as the attribution problem in security domains. We present TopHat, a system that solves such attribution problems for multi-stage attacks. It does this by using moving target defense, i.e., shuffling the assignment of clients to server replicas, which is achieved through software defined networking. As alerts are generated, TopHat maintains state about the level of risk for each network flow and progressively isolates the malicious flows. Using a simulation, we show that TopHat can identify single and multiple attackers in a variety of systems with different numbers of servers, layers, and clients.
Deception has been used for thousands of years to influence thoughts. Comparatively, deception has been used in computing since the 1970s. Its application to security has been documented in a variety of studies and products on the market, but continues to evolve with new research and tools.
There has been limited research regarding the application of deception to software patching in non-real time systems. Developers and engineers test programs and applications before deployment, but they cannot account for every flaw that may occur during the Software Development Lifecycle (SDLC). Thus, throughout an application’s lifetime, patches must be developed and distributed to improve appearance, security, and/or performance. Given a software security patch, an attacker can find the exact line(s) of vulnerable code in unpatched versions and develop an exploit without meticulously reviewing source code, thus lightening the workload to develop an attack. Applying deceptive techniques to software security patches as part of the defensive strategy can increase the workload necessary to use patches to develop exploits.
Introducing deception into security patch development makes attackers’ jobs more difficult by casting doubt on the validity of the data they receive from their exploits. Software security updates that use deception to influence attackers’ decision making and exploit generation are called deceptive patches. Deceptive patching techniques could include inserting fake patches, making real patches confusing, and responding falsely to requests as if the vulnerability still exists. These could increase attackers’ time spent attempting to discover, exploit and validate vulnerabilities and provide defenders information about attackers’ habits and targets.
This dissertation presents models, implementations, and analysis of deceptive patches to show the impact of deception on code analysis and an attacker’s exploit generation process. Our implementation shows that deceptive patches do increase the workload necessary to analyze programs. The analysis of the generated models show that deceptive patches inhibit various phases of attacker’s exploit generation process. Thus, we show that it is feasible to introduce deception into the software patching lifecycle to influence attacker decision making.
Online rating systems are widely accepted as means for quality assessment on the web and users increasingly rely on these systems when deciding to purchase an item online. This makes such rating systems frequent targets of attempted manipulation by posting unfair rating scores. Therefore, providing useful, realistic rating scores as well as detecting unfair behavior are both of very high importance. Existing solutions are mostly majority based, also employing temporal analysis and clustering techniques. However, they are still vulnerable to unfair ratings. They also ignore distances between options, the provenance of information and different dimensions of cast rating scores while computing aggregate rating scores and trustworthiness of users. In this paper, we propose a robust iterative algorithm which leverages information in the profile of users and provenance of information and which takes into account the distance between options to provide both more robust and informative rating scores for items and trustworthiness of users. We also prove convergence of iterative ranking algorithms under very general assumptions which are satisfied by the algorithm proposed in this paper. We have implemented and tested our rating method using both simulated data as well as four real-world datasets from various applications of reputation systems. The experimental results demonstrate that our model provides realistic rating scores even in the presence of a massive amount of unfair ratings and outperforms the well-known ranking algorithms.