Reports and Papers Archive

Browse All Papers »       Submit A Paper »

Differentially private data publishing for data analysis

CERIAS TR 2016-12
Dong Su

In the information age, vast amounts of sensitive personal information are collected by companies, institutions and governments. A key technological challenge is how to design mechanisms for effectively extracting knowledge from data while preserving the privacy of the individuals involved. In this dissertation, we address this challenge from the perspective of differentially private data publishing. Firstly, we propose PrivPfC, a differentially private method for releasing data for classification. The key idea underlying PrivPfC is to privately select, in a single step, a grid, which partitions the data domain into a number of cells. This selection is done using the exponential mechanism with a novel quality function, which maximizes the expected number of correctly classified records by a histogram classifier. PrivPfC supports both the binary classification as well as the multiclass classification. Secondly, we study the problem of differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of the DPLloyd algorithm which is a differentially private version of the Lloyd algorithm and propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. We also propose a hybrid approach that combines the advantages of the improved version of DPLloyd and EUGkM. Finally, we investigate the sparse vector technique (SVT) which is a fundamental technique for satisfying differential privacy in answering a sequence of queries. We propose a new version of SVT that provides better utility by introducing an effective technique to improve the performance of SVT in the interactive setting. We also show that in the non-interactive setting (but not the interactive setting), usage of SVT can be replaced by the exponential mechanism.

Added 2017-12-07

Compiler-Based Mitigations of Vulnerabilities in Systems Software

CERIAS TR 2017-6
Scott A. Carr

Systems software written in C/C++ is plagued by bugs, which attackers exploit to gain control of systems, leak sensitive data, or perform denial-of-service attacks. This plethora of vulnerabilities is caused by C/C++ not enforcing memory or type safety in language by design, instead they leave security checks to the programmer. ^ Previous research primarily focuses on preventing control-flow hijack attacks. In a control-flow hijack attack, the attacker manipulates a return address or function pointer to cause code of her choosing to be executed. Abadi et al. propose Control- Flow Integrity (CFI), to prevent such attacks, but as our CFI survey shows, CFI mechanisms have varying degrees of precision. Researchers exploit the imprecision in CFI implementations to evade their protection. One area of imprecision in CFI mechanisms is virtual functions in C++ programs. Attackers can re-target virtual function calls to other invalid functions as part of an exploit. Our work, VTrust, provides specialized protection for C++ virtual functions with low overhead. ^ As CFI mechanisms improve, and are widely deployed, attackers will follow the path of least resistance towards other attack vectors, e.g., non-control-data attacks. In a non-control-data attack the attacker manipulates ordinary variables (not return addresses, function pointers, etc.) to carry out the attack. Non-control-data attacks are not prevented by CFI, because the control-flow follows a valid path in the original program. The attack is carried out by modifying only non-control-data. To address this emerging problem, we have developed Data Confidentiality and Integrity (DCI) which allows the programmer to select which data types should be protected from corruption and information leakage by the attacker. ^ In this dissertation, we propose that by using static analysis and runtime checks, we can prevent attacks targeted at sensitive data with low overhead. We have evaluated our techniques, VTrust and DCI, on the SPEC CPU2006 benchmarks, the Firefox web browser, and the mbedTLS cryptographic library. Our results show our implementations have lower performance overhead than other state-of-the-art mechanisms. In our security evaluation, we have several case studies which show our defenses mitigate publicly disclosed vulnerabilities in widely deployed software. In future work, we plan to improve our static sensitivity analysis for DCI and investigate new methods for automatically identifying sensitive data.

Added 2017-12-07

PrivBioMTAuth: Privacy Preserving Biometrics-Based and User Centric Protocol for User Authentication from Mobile Phones

CERIAS TR 2017-4
Hasini Gunasinghe, Elisa Bertino
Download: PDF

We introduce a privacy preserving biometrics-based authentication solution by which users can authenticate to different service providers from mobile phones without involving identity providers in the transactions.  Authentication is performed via zero-knowledge proof of knowledge, based on a cryptographic identity token that encodes the biometric identifier of the user and a secret provided by the user, making it three-factor authentication. Our approach for generating a unique, repeatable and revocable biometric identifier from the user’s biometric image is based on a machine learning based classification technique which involves the features extracted from the user’s biometric image. We have implemented a prototype of the proposed authentication solution and evaluated our solution with respect to its performance, security and privacy. The evaluation has been performed on a public dataset of face images.

Added 2017-11-10

TOPHAT: Topology-based Host-Level Attribution for Multi-Stage Attacks in Enterprise Systems using Software Defined Networks

CERIAS TR 2017-4
Subramaniyam Kannan,Paul Wood, Larry Deatrick, Patricia Beane,Somali Chaterji, and Saurabh Bagchi

Multi-layer distributed systems, such as those found in corporate systems, are often the target of multi-stage attacks. Such attacks utilize multiple victim machines, in a series, to compromise a target asset deep inside the corporate network. Under such attacks, it is difficult to identify the upstream attacker’s identity from a downstream victim machine because of the mixing of multiple network flows. This is known as the attribution problem in security domains. We present TopHat, a system that solves such attribution problems for multi-stage attacks. It does this by using moving target defense, i.e., shuffling the assignment of clients to server replicas, which is achieved through software defined networking. As alerts are generated, TopHat maintains state about the level of risk for each network flow and progressively isolates the malicious flows. Using a simulation, we show that TopHat can identify single and multiple attackers in a variety of systems with different numbers of servers, layers, and clients.

Added 2017-08-17

The Application of Deception to Software Security Patching

CERIAS TR 2017-3
Jeffrey K. Avery
Download: PDF

Deception has been used for thousands of years to influence thoughts.  Comparatively, deception has been used in computing since the 1970s.  Its application to security has been documented in a variety of studies and products on the market, but continues to evolve with new research and tools.

There has been limited research regarding the application of deception to software patching in non-real time systems.  Developers and engineers test programs and applications before deployment, but they cannot account for every flaw that may occur during the Software Development Lifecycle (SDLC).  Thus, throughout an application’s lifetime, patches must be developed and distributed to improve appearance, security, and/or performance.  Given a software security patch, an attacker can find the exact line(s) of vulnerable code in unpatched versions and develop an exploit without meticulously reviewing source code, thus lightening the workload to develop an attack.  Applying deceptive techniques to software security patches as part of the defensive strategy can increase the workload necessary to use patches to develop exploits.

Introducing deception into security patch development makes attackers’ jobs more difficult by casting doubt on the validity of the data they receive from their exploits.  Software security updates that use deception to influence attackers’ decision making and exploit generation are called deceptive patches.  Deceptive patching techniques could include inserting fake patches, making real patches confusing, and responding falsely to requests as if the vulnerability still exists.  These could increase attackers’ time spent attempting to discover, exploit and validate vulnerabilities and provide defenders information about attackers’ habits and targets.

This dissertation presents models, implementations, and analysis of deceptive patches to show the impact of deception on code analysis and an attacker’s exploit generation process. Our implementation shows that deceptive patches do increase the workload necessary to analyze programs.  The analysis of the generated models show that deceptive patches inhibit various phases of attacker’s exploit generation process.  Thus, we show that it is feasible to introduce deception into the software patching lifecycle to influence attacker decision making.

Added 2017-08-03

Security Techniques for Sensor Systems and the Internet of Things

CERIAS TR 2016-10
Daniele Midi
Download: PDF
Added 2017-06-12

A Provenance-Aware Multi-Dimensional Reputation System For Online Rating Systems

CERIAS TR 2017-2
Mohsen Rezvani, Aleksandar Ignjattovic, Elisa Bertino
Download: PDF

Online rating systems are widely accepted as means for quality assessment on the web and users increasingly rely on these systems when deciding to purchase an item online. This makes such rating systems frequent targets of attempted manipulation by posting unfair rating scores. Therefore, providing useful, realistic rating scores as well as detecting unfair behavior are both of very high importance. Existing solutions are mostly majority based, also employing temporal analysis and clustering techniques. However, they are still vulnerable to unfair ratings. They also ignore distances between options, the provenance of information and different dimensions of cast rating scores while computing aggregate rating scores and trustworthiness of users. In this paper, we propose a robust iterative algorithm which leverages information in the profile of users and provenance of information and which takes into account the distance between options to provide both more robust and informative rating scores for items and trustworthiness of users. We also prove convergence of iterative ranking algorithms under very general assumptions which are satisfied by the algorithm proposed in this paper. We have implemented and tested our rating method using both simulated data as well as four real-world datasets from various applications of reputation systems. The experimental results demonstrate that our model provides realistic rating scores even in the presence of a massive amount of unfair ratings and outperforms the well-known ranking algorithms.

Added 2017-06-06

Automated Differential Testing for Energy-Efficient Control Software

CERIAS TR 2018-01
Hongjun Choi
Download: PDF

Cyber-physical systems (CPS) are integrated systems of computer-based algorithms and physical components interacting with environmental effects. In such systems, autonomous behaviors and overall performance mainly depend on a control software. Thus, it is crucial to test and analyze the control software of the CPS in various perspectives. One of the critical perspectives is energy efficiency because many cyber-physical systems (e.g. unmanned aerial vehicles, autonomous cars, health-care devices) operate with limited energy sources such as batteries. In this paper, we propose CPSDiff: an energy-aware differential testing framework that generates test inputs to expose the maximal difference between two control programs in energy consumption. Our test generation technique uses meta-heuristic searching to find the input that maximizes the energy consumption difference. The difference-revealing ability of our technique outperforms the random search algorithm and hill-climbing search algorithm. Our evaluation of two popular unmanned aerial vehicle control programs provides a detailed comparison of their energy consumption under the same condition with a universal robotics simulator; CPSDiff found the input which exposes maximum battery consumption difference of around 47%.

Added 2017-05-15

Privacy-Preserving Analysis with Applications to Textual Data

CERIAS TR 2017-01
Balamurugan Anandan
Download: PDF

Textual data plays a very important role in decision making and scientific research, but cannot be shared freely if they contain personally identifiable information. In this dissertation, we consider the problem of privacy-preserving text analysis, while satisfying a strong privacy definition of differential privacy.

We first show how to build a two-party differentially private secure protocol for computing similarity of text in the presence of malicious adversaries.  We then relax the utility requirement of computational differential privacy to reduce computational cost, while still giving security with rational adversaries.

Next, we consider the problem of building a data-oblivious algorithm for minimum weighted matching in bipartite graphs, which has applications to computing secure semantic similarity of documents. We also propose a secure protocol for detecting articulation points in graphs. We then relax the strong data-obliviousness definition to give $\epsilon$-data-obliviousness based on the notion of indistinguishability, which helps us to develop efficient protocols for data-dependent algorithms like frequent itemset mining.

Finally, we consider the problem of privacy-preserving classification of text. A main problem in developing private protocols for unstructured data is high dimensionality. This dissertation tackles high dimensionality by means of differentially private feature selection. We show that some of the well known feature selection techniques perform poorly due to high sensitivity and we propose techniques that perform well in a differential private setting. The feature selection techniques are empirically evaluated using differentially private classifiers: na\”{i}ve Bayes, support vector machine and decision trees.

Added 2017-05-13

Securing Cloud-Based Data Analytics: A Practical Approach

CERIAS TR 2016-9
Julian James Stephen
Download: PDF

The ubiquitous nature of computers is driving a massive increase in the amount of data generated by humans and machines. The shift to cloud technologies is a paradigm change that offers considerable financial and administrative gains in the ef- fort to analyze these data. However, governmental and business institutions wanting to tap into these gains are concerned with security issues. The cloud presents new vulnerabilities and is dominated by new kinds of applications, which calls for new security solutions. In the direction of analyzing massive amounts of data, tools like MapReduce, Apache Storm, Dryad and higher-level scripting languages like Pig Latin and DryadLINQ have significantly improved corresponding tasks for software devel- opers. The equally important aspect of securing computations performed by these tools and ensuring confidentiality of data has seen very little support emerge for programmers. In this dissertation, we present solutions to a. secure computations being run in the cloud by leveraging BFT replication coupled with fault isolation and b. se- cure data from being leaked by computing directly on encrypted data. For securing computations (a.), we leverage a combination of variable-degree clustering, approx- imated and offline output comparison, smart deployment, and separation of duty to achieve a parameterized tradeoff between fault tolerance and overhead in prac- tice. We demonstrate the low overhead achieved with our solution when securing data-flow computations expressed in Apache Pig, and Hadoop. Our solution allows assured computation with less than 10 percent latency overhead as shown by our evaluation. For securing data (b.), we present novel data flow analyses and program xi transformations for Pig Latin and Apache Storm, that automatically enable the ex- ecution of corresponding scripts on encrypted data. We avoid fully homomorphic encryption because of its prohibitively high cost; instead, in some cases, we rely on a minimal set of operations performed by the client. We present the algorithms used for this translation, and empirically demonstrate the practical performance of our approach as well as improvements for programmers in terms of the effort required to preserve data confidentiality.

Added 2017-05-09

Hardware Accelerated Lattice Based Authentication

Anand Mudgerikar

The Internet of Things (IoT) is growing at a rapid pace and it is essential that the communication between these resource constrained IoT devices is secure and efficient. Traditional authentication schemes like RSA and ECDSA do not conform to the high throughput and time critical needs of these devices. In this research, we propose a hardware accelerated lattice based authentication scheme tailored specifically towards constrained IoT devices. We show that our scheme provides lower latency and higher throughput compared to traditional authentication schemes and its CPU only counter-parts.

Added 2017-03-09

Convicted by Memory: Automatically Recovering Spatial-Temporal Evidence from Memory Images

Brendan Dominic Saltaformaggio

Memory forensics can reveal “up to the minute” evidence of a device’s usage, often without requiring a suspect’s password to unlock the device, and it is oblivious to any persistent storage encryption schemes, e.g., whole disk encryption. Prior to my work, researchers and investigators alike considered data-structure recovery the ultimate goal of memory image forensics. This, however, was far from sufficient, as investigators were still largely unable to understand the content of the recovered evidence, and hence efficiently locating and accurately analyzing such evidence locked in memory images remained an open research challenge. In this dissertation, I propose breaking from traditional data-recovery-oriented forensics, and instead I present a memory forensics framework which leverages pro- gram analysis to automatically recover spatial-temporal evidence from memory im- ages by understanding the programs that generated it. This framework consists of four techniques, each of which builds upon the discoveries of the previous, that repre- sent this new paradigm of program-analysis-driven memory forensics. First, I present DSCRETE, a technique which reuses a program’s own interpretation and rendering logic to recover and present in-memory data structure contents. Following that, VCR developed vendor-generic data structure identification for the recovery of in-memory photographic evidence produced by an Android device’s cameras. GUITAR then re- alized an app-independent technique which automatically reassembles and redraws an app’s GUI from the multitude of GUI data elements found in a smartphone’s memory image. Finally, different from any traditional memory forensics technique, ix RetroScope introduced the vision of spatial-temporal memory forensics by retarget- ing an Android app’s execution to recover sequences of previous GUI screens, in their original temporal order, from a memory image. This framework, and the new program analysis techniques which enable it, have introduced encryption-oblivious forensics capabilities far exceeding traditional data-structure recovery.

Added 2017-02-16

Rating Maintenance Phase Program Document

National Computer Security Association
Added 2017-01-19