Agencies and other organizations often need to publish microdata, e.g., medical data or census data, for research and other purposes. While the released datasets provide valuable information to researchers, they also contain sensitive information about individuals whose privacy may be at risk. To reduce the disclosure risks, one approach is to anonymize the microdata before it is released. Research in data anonymization aims at limiting disclosure risks to an acceptable level while maximizing data utility. In this project, we study several fundamental issues in balancing the privacy with utility in microdata publishing. Some of the research directions are as follows. First, existing privacy requirements in data publishing, such as k-anonymity, l-diversity, and t-closeness, all have limitations and shortcomings in protecting attribute disclosure while preserving data utility. We work on building a robust and effective privacy requirement. Second, when the adversary has additional background knowledge about the dataset, she would be able to make more precise inference on the individuals’ sensitive attribute values. We study approaches to model the adversary’s background knowledge and techniques to prevent background knowledge attacks. Third, few existing research work studies anonymization of datasets that are continuously updated. Such a dynamic setting requires defining a new notion of privacy and proposing techniques to achieve the privacy requirement. Finally, a careful study of privacy/utility trade-off will help us better understand the whole data publishing process.
Detecting packet drop attacks is important for security of MANETs and current random audit based mechanism cannot detect collaborative attacks. In this paper, we design a hash function based method to generate node behavioral proofs that contain information from both data traffic and forwarding paths. The new method is robust against collaborative attacks described in the paper and it introduces limited computing overhead on the intermediate nodes. We investigate the security of the proposed approach and design schemes to further reduce the overhead.
Current database management systems require all data to be modeled in terms of precise values. However, there is a large number of application domains where data values are imprecise or uncertain. Examples of such data include measurements for sensors, locations of moving objects, and experimental data. For these applications there is a need to develop a database management system that supports uncertain data types.
The project aims to develop a comprehensive database management system for storing and querying uncertain, or imprecise data. The project encompasses the creation of a comprehensive model for uncertain data based upon the relational model, the extension of SQL to support probabilistic queries over uncertain data, techniques for efficient and accurate evaluation of probabilistic queries, and the development of a prototype system. The specific optimization issues addressed include indexing, join algorithms, and query optimization for uncertain data.
The prototype will be developed as an extension of the open-source PostgreSQL database management system. A realistic moving objects’ application is targeted for testing of the prototype. In addition, collaboration with experts in biology and chemistry will serve as validations of the applicability of the developments in these domains.
The project is expected to have a significant impact on application domains that are in need of an uncertain data management system, and also on the database community. The proposal is expected to provide a single model for multiple types of uncertainty, and to develop indexing, join, and query optimization techniques for uncertain data.
The goal of this project is to develop techniques that detect anomalous behavior of applications and users and that are specifically tailored to database systems. An approach has been developed based on mining SQL queries stored in database audit log files. The result of the mining process is used to form profiles that can model normal database access behavior and identify intruders. We consider two different scenarios while addressing the problem. In the first case, we assume that the database has a Role Based Access Control (RBAC) model in place. Under a RBAC system permissions are associated with roles, grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals that, while holding a specific role, behave differently than expected. An important advantage of providing an ID technique specifically tailored to RBAC databases is that it can help in protecting against insider threats. Furthermore, the existence of roles makes our approach usable even for databases with large user population. In the second scenario, we assume that there are no roles associated with users of the database. In this case, we look directly at the behavior of the users. We employ clustering algorithms to form concise profiles representing normal user behavior. For detection, we either use these clustered profiles as the roles or employ outlier detection techniques to identify behavior that deviates from the profiles. Our preliminary experimental evaluation on both real and synthetic database traces shows that our methods work well in practical situations.
VoIP applications have gained popularity due to largely reduced cost and wider range of advanced services, as compared to traditional telephone networks. However, spit (Spam over Internet Telephony), known as unsolicited bulk calls sent via VoIP networks¸ is becoming a major problem that would undermine the usability of VoIP. Unlike detection and filtering of e-mail spam, countermeasures against spit face great challenges on how to identify and filter spit in real time. In this paper, a user-behavior-aware anti-spit technique implemented at the router level for detecting and filtering spit is proposed. The rationale for the technique is that voice spammers behave significantly different from legitimate callers because of their revenue-driven motivations. The technique defines and combines three features developed from user behavior analyses to detect and filter spam calls. Compared to existing spit defending techniques, it is simple, fast and effective. Other advantages of our approach are that it is applicable for detecting and filtering both machine-initiated and human-initiated spam calls, better protects VoIP calls against sybil attacks and spammer behavior changes.