Reports and Papers Archive - Reports & Papers

Dynamic Integration and Query Processing with Ranked Role Sets

CERIAS TR 2001-152

Christopher Clifton

The role-set approach is a new conceptual framework for data integration in multidatabase systems that maintains the materialization autonomy of local database systems and provides users with more accurate information. The role-set approach presents the answer to a query as a set of relations where the distinct intersections between the relations corresponding to the various roles played by an entity. In this paper we show how the basic role-based approach can be extended in the absence of information about the multidatabase keys (global IDs). We propose a strategy based on ranked role-sets that makes use of a semantic integration procedure based on neural networks to determine candidate global IDs. The data integration and query processing steps then produce a number of role-sets, ranked by the similarity of the candidate IDs.

Added 2008-02-04

Security and Privacy Implications of Data Mining

CERIAS TR 2001-88

Christopher Clifton

Download: PDF

Data mining enables us to discover information we do not expect to find in databases. This can be a security/privacy issue: If we make information available, are we perhaps giving out more than we bargained for? This position paper discusses possible problems and solutions, and outlines ideas for further research in this area.

Added 2008-02-04

Classifying software components using design characteristics

CERIAS TR 2001-87

Christopher Clifton

Download: PDF

Classifying software modules in a component library is a major problem in software reuse. Indexing criteria must adequately reflect the semantics of the components. This must be done without undue effort in either classifying the software, or developing â€œqueriesâ€ to find candidates for reuse. We present an architecture for automatically classifying and querying software based on design information. We present a method for determining if indexing criteria are effective, and show results using a set of criteria automatically extracted from an existing collection of programs

Added 2008-02-04

Semantic Integration in Heterogeneous Databases using neural networks

CERIAS TR 2001-86

Christopher Clifton

Download: PDF

One important step in integrating heterogeneous databases is matching equivalent attributes: Determining which fields in two databases refer to the same data. The meaning of information may be embodied within a. database model, a conceptual schema, application programs, or data contents. Integration involves extracting semantics, expressing them as metadata, and matching semantically equivalent data elements. We present a procedure using a classifier to categorize attributes according to their field specifications and data values, then train a neural network to recognize similar attributes. In our technique, the knowledge of how to match equivalent data elements is â€œdiscoveredâ€ from metadata , not â€œpre-programmedâ€.

Added 2008-02-04

Using Field Specifications to Determine Attribute Equivalence in Heterogeneous Databases

CERIAS TR 2001-85

Christopher Clifton

Download: PDF

One step in integrating heterogeneous database systems is matching equivalent attributes: determining which fields in the two databases refer to the same data. The authors see three (complementary) techniques to automate this process: synonym dictionaries that compare field names, design criteria that compare field specifications, and comparison of data values. They present a technique for using field specifications to compare attributes, and evaluate this technique on a variety of databases.

Added 2008-02-04

The Gold Mailer

CERIAS TR 2001-84

Christopher Clifton

Download: PDF

The Gold Mailer, a system that provides users with an integrated way to send and receive messages using different media, efficiently store and retrieve these messages, and access a variety of sources of other useful information, is described. The mailer solves the problems of information overload, organization of messages and multiple interfaces. By providing good storage and retrieval facilities, it can be used as a powerful information processing engine covering a range of useful office information. The Gold Mailer’s query language, indexing engine, file organization, data structures, and support of mail message data and multimedia documents are discussed.

Added 2008-02-04

Distributed processing of filtering queries in HyperFile

CERIAS TR 2001-82

Christopher Clifton

Download: PDF

A language has been developed for queries which serves as an extension of the browsing model of hypertext systems. The query language and data model fit naturally into a distributed environment. A simple and efficient method is discussed for processing distributed queries in this language. Results of experiments run on a distributed data server using this algorithm are presented

Added 2008-02-04

Indexing in a Hypertext Database

CERIAS TR 2001-81

Christopher Clifton

Download: PDF

Database indexing is a well studied problem. However,the advent of Hypertext databases opens new questions in indexing. Searches are often demarcated by pointers between text items. Thus the scope of the search may change dynamically, whereas traditional indexes cover a statically defined region such as a relation. We present techniques for indexing in hypertext databases and compare their performance.

Added 2008-02-04

The design of a document database

CERIAS TR 2001-83

Christopher Clifton

Download: PDF

In this paper a Document Base Management System is proposed that incorporates conventional database and hypertext ideas into a document database. The Document Base operates as a server, users access the database through different application programs. the query language which applications use to retrieve documents is described.

Added 2008-02-04

Dynamic Integration in Multidatabase Systems

CERIAS TR 2001-153

Christopher Clifton

The first step in interoperating among multidatabases is semantic integration: Producing attribute correspondences that describe relationships between attributes or classes in different database schemas. Dynamic integration requires the ability to automatically extract database semantics, express them as metadata, and match semantically equivalent data elements to produce attribute correspondences. This process cannot be pre-programmed since the information to be accessed is heterogeneous. An architecture supporting dynamic integration is presented. Semint, a tool for automated semantic integration, that helps database administrators generate attribute correspondences, is discussed. A novel framework for dynamic integration and a query language for multidatabase systems that uses Semint as part of a complete semantic integration service are introduced. The framework supports dynamic integration as well as incremental integration. The advantages of the framework in an environment where full integration is not desired or complete knowledge of the databases to be integrated is unavailable are shown.

Added 2008-02-04

AC-Framework for Privacy-Preserving Collaboration

CERIAS TR 2007-100

Christopher Clifton

Download: PDF

The secure multi-party computation (SMC) model provides means for balancing the use and confidentiality of distributed data. Increasing security concerns have led to a surge in work on practical secure multi- party computation protocols. However, most are only proven secure under the semi-honest model, and security under this adversary model is insufficient for most applications. In this paper, we propose a novel framework: accountable computing (AC) framework, which is sufficient or practical for many applications without the complexity and cost of a SMC-protocol under the malicious model. Furthermore, to show the applicability of the AC-framework, we present an application under this framework regarding privacy-preserving mining frequent itemsets.

Added 2008-02-04

Secure Collaborative Planning, Forecasting, and Replenishment

CERIAS TR 2006-65

Atallah

Download: PDF

Although the benefits of information sharing between supply-chain partners are well known, many compa- nies are averse to share their â€œprivateâ€ information due to fear of adverse impact of information leakage. This paper uses techniques from Secure Multiparty Computation (SMC) to develop â€œsecure protocolsâ€ for the CPFR r

Added 2008-02-04

Verifying Data Integrity in Peer-to-Peer Media Streaming

CERIAS TR 2005-135

Atallah

Download: PDF

We study data integrity verification in peer-to-peer media streaming for content distribution. Challenges include the timing constraint of streaming as well as the untrustworthiness of peers. We show the inade- quacy of existing data integrity verification protocols, and propose Block-Oriented Probabilistic Verification (BOPV), an efficient protocol utilizing message digest and probabilistic verification. We then propose Tree- based Forward Digest Protocol (TFDP) to further reduce the communication overhead. A comprehensive comparison is presented by comparing the performance of existing protocols and our protocols, with respect to overhead, security assurance level, and packet loss tolerance. Finally, experimental results are presented to evaluate the performance of our protocols.

Added 2008-02-04

Protecting Against Data Mining through Samples

CERIAS TR 2001-96

Christopher Clifton

Download: PDF

Data mining introduces new problems in database security. The basic problem of using non-sensitive data to infer sensitive data is made more difficult by the â€œprobabilisticâ€ inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determine sample sizes where data mining tools cannot obtain reliable results.

Added 2008-02-04

Detection of Significant Sets of Episodes in Event Sequences

CERIAS TR 2004-93

Atallah

Download: PDF

Added 2008-02-04