The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Reports and Papers Archive


Browse All Papers »       Submit A Paper »

Passwords Decay, Words Endure: Secure and Re-usable Multiple Password Mnemonics

CERIAS TR 2007-98
Atallah
Download: PDF
Added 2008-02-01

Secure and Private Collaborative Linear Programming

CERIAS TR 2006-64
Atallah
Download: PDF
Added 2008-02-01

Point-Based Trust: Define How Much Privacy Is Worth

CERIAS TR 2006-63
Atallah
Download: PDF

This paper studies the notion of point-based policies for trust management, and gives protocols for realizing them in a disclosure-minimizing fashion. Specifically, Bob values each credential with a certain number of points, and requires a minimum total threshold of points before granting Alice access to a resource. In turn, Alice values each of her credentials with a privacy score that indicates her reluctance to reveal that credential. Bob’s valuation of credentials and his threshold are private. Alice’s privacy-valuation of her credentials is also private. Alice wants to find a subset of her credentials that achieves Bob’s required threshold for access, yet is of as small a value to her as possible. We give protocols for computing such a subset of Alice’s credentials without revealing any of the two parties’ above-mentioned private information.

Added 2008-02-01

Words are Not Enough: Sentence Level Natural Language Watermarking

CERIAS TR 2006-62
Atallah
Download: PDF
Added 2008-02-01


Lost in Just the Translation

CERIAS TR 2006-60
Atallah
Download: PDF
Added 2008-02-01

ViWiD: Visible Watermarking-Based Defense Against Phishing

CERIAS TR 2005-130
Atallah
Download: PDF
Added 2008-02-01

Privacy-preserving distributed mining of association rules on horizontally partitioned data

CERIAS TR 2004-91
Christopher Clifton
Download: PDF

Data mining can extract important knowledge from large data collections ut sometimes these collections are split among various parties. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. We address secure mining of association rules over horizontally partitioned data. The methods incorporate cryptographic techniques to minimize the information shared, while adding little overhead to the mining task.

Added 2008-01-31

TopCat: data mining for topic identification in a text corpus

CERIAS TR 2004-90
Christopher Clifton
Download: PDF

TopCat (topic categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. We present a novel method for identifying related items based on traditional data mining techniques. Frequent itemsets are generated from the groups of items, followed by clusters formed with a hypergraph partitioning scheme. We present an evaluation against a manually categorized ground truth news corpus; it shows this technique is effective in identifying topics in collections of news articles.

Added 2008-01-31

Change Detection in Overhead Imagery Using Neural Networks

CERIAS TR 2003-45
Christopher Clifton
Download: PDF

Identifying interesting changes from a sequence of overhead imagery—as opposed to clutter, lighting/seasonal changes, etc.—has been a problem for some time. Recent advances in data mining have greatly increased the size of datasets that can be attacked with pattern discovery methods. This paper presents a technique for using predictive modeling to identify unusual changes in images. Neural networks are trained to predict “before” and “after” pixel values for a sequence of images. These networks are then used to predict expected values for the same images used in training. Substantial differences between the expected and actual values represent an unusual change. Results are presented on both multispectral and panchromatic imagery.

Added 2008-01-31

Emerging standards for data mining

CERIAS TR 2001-80
Christopher Clifton
Download: PDF

This paper presents an overview of data mining, then discusses standards (both existing and proposed) that are relevant to data mining. This includes standards that affect several stages of a data mining project. Summaries of several emerging standards are given, as well as proposals that have the potential to change the way data mining tools are built.

Added 2008-01-31

Using sample size to limit exposure to data mining

CERIAS TR 2001-79
Christopher Clifton
Download: PDF

Data mining introduces new problems in database security. The basic problem of using non-sensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determine sample sizes where data mining tools cannot obtain reliable results.

Added 2008-01-31

SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks

CERIAS TR 2001-78
Christopher Clifton
Download: PDF

One step in interoperating among heterogeneous databases is semantic integration: Identifying relationships between attributes or classes in different database schemas. SEMantic INTegrator (SEMINT) is a tool based on neural networks to assist in identifying attribute correspondences in heterogeneous databases. SEMINT supports access to a variety of database systems and utilizes both schema information and data contents to produce rules for matching corresponding attributes automatically. This paper provides theoretical background and implementation details of SEMINT. Experimental results from large and complex real databases are presented. We discuss the effectiveness of SEMINT and our experiences with attribute correspondence identification in various environments.

Added 2008-01-31

Database Integration Using Neural Networks: Implementation and Experiences

CERIAS TR 2001-77
Christopher Clifton
Download: PDF

Applications in a wide variety of industries require access to multiple heterogeneous distributed databases. One step in heterogeneous database integration is semantic integration: identifying corresponding attributes in different databases that represent the same real world concept. The rules of semantic integration can not be ‘pre-programmed’ since the information to be accessed is heterogeneous and attribute correspondences could be fuzzy. Manually comparing all possible pairs of attributes is an unreasonably large task. We have applied artificial neural networks (ANNs) to this problem. Metadata describing attributes is automatically extracted from a database to represent their ‘signatures’. The metadata is used to train neural networks to find similar patterns of metadata describing corresponding attributes from other databases. In our system, the rules to determine corresponding attributes are discovered through machine learning. This paper describes how we applied neural network techniques in a database integration problem and how we represent an attribute with its metadata as discriminators. This paper focuses on our experiments on effectiveness of neural networks and each discriminator. We also discuss difficulties of using neural networks for this problem and our wish list for the Machine Learning community.

Added 2008-01-31