Reports and Papers Archive - Reports & Papers

SP-GiST: An Extensible Database Index for Supporting Space Partitioning Trees

WG Aref, IF Ilyas

Download: PDF

Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST’‘s tuning parameters.

Added 2008-04-21

Space-Partitioning Trees in PostgreSQL: Realization and Performance

MY Eltabakh, R Eltarras, WG Aref

Download: PDF

Many evolving database applications warrant the use of non-traditional indexing mechanisms beyond B+-trees and hash tables. SP-GiST is an extensible indexing framework that broadens the class of supported indexes to include disk-based versions of a wide variety of space-partitioning trees, e.g., disk-based trie variants, quadtree variants, and kd-trees. This paper presents a serious attempt at implementing and realizing SP-GiST-based indexes inside PostgreSQL. Several index types are realized inside PostgreSQL facilitated by rapid SP-GiST instantiations. Challenges, experiences, and performance issues are addressed in the paper. Performance comparisons are conducted from within PostgreSQL to compare update and search performances of SP-GiST-based indexes against the B+-tree and the R-tree for string, point, and line segment data sets. Interesting results that highlight the potential performance gains of SPGiST- based indexes are presented in the paper.

Added 2008-04-21

A Constraint-Based Approach for the Authoring of Multi-Topic Multimedia Presentations

Elisa Bertino

Synchronized multimedia applications play an important role in a digital library environment, since they allow one to efficiently disseminate knowledge among differently skilled users through an approach, which is more direct than the classic ‘static’ documents. In this paper, we propose a new authoring approach based on an innovative presentation structure and a new class of content-based constraints. Thanks to a flexible heuristic process, such features allow the author to easily combine several multimedia objects into a multi-topic presentation, whose different contents can be freely chosen by end users according to their preferences or skills

Added 2008-04-21

Auth-SL - A System for the Specification and Enforcement of Quality-Based Authentication Policies

Elisa Bertino

This paper develops a language and a reference architecture supporting the management and enforcement of authentication policies. Such language directly supports multi-factor authentication and the high level specification of authentication factors, in terms of conditions against the features of the various authentication mechanisms and modules. In addition the language supports a rich set of constraints; by using these constraints, one can specify for example that a subject must be authenticated by two credentials issued by different authorities. The paper presents a logical definition of the language and its corresponding XML encoding. It also reports an implementation of the proposed authentication system in the context of the FreeBSD Unix operating system (OS). Critical issues in the implementation are discussed and performance results are reported. These results show that the implementation is very efficient.

Added 2008-04-21

A hierarchical access control model for video database systems

Elisa Bertino

Content-based video database access control is becoming very important, but it depends on the progresses of the following related research issues: (a) efficient video analysis for supporting semantic visual concept representation; (b) effective video database indexing structure; (c) the development of suitable video database models; and (d) the development of access control models tailored to the characteristics of video data. In this paper, we propose a novel approach to support multilevel access control in video databases. Our access control technique combines a video database indexing mechanism with a hierarchical organization of visual concepts (i.e., video database indexing units), so that different classes of users can access different video elements or even the same video element with different quality levels according to their permissions. These video elements, which, in our access control mechanism, are used for specifying the authorization objects, can be a semantic cluster, a subcluster, a video scene, a video shot, a video frame, or even a salient object (i.e., region of interest). In the paper, we first introduce our techniques for obtaining these multilevel video access units. We also propose a hierarchical video database indexing technique to support our multilevel video access control mechanism. Then, we present an innovative access control model which is able to support flexible multilevel access control to video elements. Moreover, the application of our multilevel video database modeling, representation, and indexing for MPEG-7 is discussed.

Added 2008-04-21

WARP: Time Warping for Periodicity Detection

MG Elfeky, WG Aref, AK Elmagarmid

Periodicity mining is used for predicting trends in time series data. Periodicity detection is an essential process in periodicity mining to discover potential periodicity rates. Existing periodicity detection algorithms do not take into account the presence of noise, which is inevitable in almost every real-world time series data. In this paper, we tackle the problem of periodicity detection in the presence of noise. We propose a new periodicity detection algorithm that deals efficiently with all types of noise. Based on time warping, the proposed algorithm warps (extends or shrinks) the time axis at various locations to optimally remove the noise. Experimental results show that the proposed algorithm outperforms the existing periodicity detection algorithms in terms of noise resiliency.

Added 2008-04-21

On query processing and optimality using spectral locality-preserving mappings

MF Mokbel, WG Aref

A locality-preserving mapping (LPM) from the multi-dimensional space into the one-dimensional space is beneficial for many applications (e.g., range queries, nearest-neighbor queries, clustering, and declustering) when multi-dimensional data is placed into one-dimensional storage (e.g., the disk). The idea behind a locality-preserving mapping is to map points that are nearby in the multi-dimensional space into points that are nearby in the one-dimensional space. For the past two decades, fractals (e.g., the Hilbert and Peano space-filling curves) have been considered the natural method for providing a locality-preserving mapping to support efficient answer for range queries and similarity search queries. In this paper, we go beyond the idea of fractals. Instead, we investigate a locality-preserving mapping algorithm (The Spectral LPM) that uses the spectrum of the multi-dimensional space. This paper provably demonstrates how Spectral LPM provides a globally optimal mapping from the multi-dimensional space to the one-dimensional space, and hence outperforms fractals. As an application, in the context of range queries and nearest-neighbor queries, empirical results of the performance of Spectral LPM validate our analysis in comparison with Peano, Hilbert, and Gray fractal mappings.

Added 2008-04-21

CiVeDi: a customized virtual environment for database interaction

Elisa Bertino

This paper presents CiVedi, a scalable system providing a flexible and customizable virtual environment for displaying multimedia contents. Using CiVeDi, both the final users and the exhibition curators can personalize the content of the visit as well as the visit appearance and its duration. The proposed solution aims to be used transparently over different media objects either stored into a database or dynamically collected from online digital libraries.

Added 2008-04-21

Incremental, online, and merge mining of partial periodic patterns in time-series databases

WG Aref, MG Elfeky, AK Elmagarmid

Abstract—Mining of periodic patterns in time-series databases is an interesting data mining problem. It can be envisioned as a tool for forecasting and prediction of the future behavior of time-series data. Incremental mining refers to the issue of maintaining the discovered patterns over time in the presence of more items being added into the database. Because of the mostly append only nature of updating time-series data, incremental mining would be very effective and efficient. Several algorithms for incremental mining of partial periodic patterns in time-series databases are proposed and are analyzed empirically. The new algorithms allow for online adaptation of the thresholds in order to produce interactive mining of partial periodic patterns. The storage overhead of the incremental online mining algorithms is analyzed. Results show that the storage overhead for storing the intermediate data structures pays off as the incremental online mining of partial periodic patterns proves to be significantly more efficient than the nonincremental nononline versions. Moreover, a new problem, termed merge mining, is introduced as a generalization of incremental mining. Merge mining can be defined as merging the discovered patterns of two or more databases that are mined independently of each other. An algorithm for merge mining of partial periodic patterns in time-series databases is proposed and analyzed.

Added 2008-04-21

Scalability Management in Sensor-Network PhenomenaBases

MH Ali, WG Aref, I Kamel

A phenomenon appears in a sensor network when a group of sensors persist to generate similar behavior over a period of time. PhenomenaBases (or databases of phenomena) are equipped with Phenomena Detection and Tracking (PDT) techniques that continuously run in the background of a sensor database system to detect new phenomena and to track already existing phenomena. The process of phenomena detection and tracking is initiated by a multi-way join operator that comes at the core of PDT techniques to report similar sensor readings. With the increase in the sensor network size, the join operator (and, consequently, query processing in the PhenomenaBase) face several scalability challenges. In this paper, we present a join operator for PhenomenaBases (the SNJoin operator) that is specially-designed for dynamically-configured large-scale sensor networks with distributed processing capabilities. Experimental studies illustrate the scalability of the proposed join operator in PhenomenaBases with respect to the number of detected phenomena and the output delay.

Added 2008-04-21

X-FEDERATE: A policy engineering framework for federated access management

Elisa Bertino

Added 2008-04-21

Search-based buffer management policies for streaming in continuous media servers

MA Hammad, WG Aref, AK Elmagarmid, A.K

In this paper we propose efficient buffer prefetching and replacement policies for continuous-media servers that support content-based search and retrieval. The new policies are based on the knowledge collected from the content-based search manager and the streaming manager. We show that by integrating the knowledge from the search and streaming components, we can achieve better caching of media streams, thus minimizing initial latency and reducing disk I/O. We test the search-based policies on a prototype video database system developed at Purdue University. The results show that initial latency is reduced on the average by 20%, compared to the traditional policies.

Added 2008-04-21

Query Indexing and Velocity Constrained Indexing: Scalable Techniques For Continuous Queries on Moving Objects

S Prabhakar, Y Xia, D Kalashnikov, WG Aref, S Hambrusch

Moving object environments are characterized by large numbers of moving objects and numerous concurrent continuous queries over these objects. Efficient evaluation of these queries in response to the movement of the objects is critical for supporting acceptable response times. In such environments the traditional approach of building an index on the objects (data) suffers from the need for frequent updates and thereby results in poor performance. In fact, a brute force, no-index strategy yields better performance in many cases. Neither the traditional approach, nor the brute force strategy achieve reasonable query processing times. This paper develops novel techniques for the efficient and scalable evaluation of multiple continuous queries on moving objects. Our solution leverages two complimentary techniques: Query Indexing and Velocity Constrained Indexing (VCI). Query Indexing relies on i) incremental evaluation; ii) reversing the role of queries and data; and iii) exploiting the relative locations of objects and queries. VCI takes advantage of the maximum possible speed of objects in order to delay the expensive operation of updating an index to reflect the movement of objects. In contrast to an earlier technique [29] that requires exact knowledge about the movement of the objects, VCI does not rely on such information. While Query Indexing outperforms VCI, it does not efficiently handle the arrival of new queries. Velocity constrained indexing, on the other hand, is unaffected by changes in queries. We demonstrate that a combination of Query Indexing and Velocity Constrained Indexing enables the scalable execution of insertion and deletion of queries in addition to processing ongoing queries. We also develop several optimizations and present a detailed experimental evaluation of our techniques. The experimental results show that the proposed schemes outperform the traditional approaches by almost two orders of magnitude.

Added 2008-04-21

Modeling and language support for the management of pattern-bases

Elisa Bertino, Manolis terrovitis, Panos Vassiliadis, Spiros Skiadopoulos, Barbara Catania, Anna Maddalena and Stefano Rizzi

Information overloading is today a serious concern that may hinder the potential of modern web-based information systems. A promising approach to deal with this problem is represented by knowledge extraction methods able to produce artifacts (also called patterns) that concisely represent data. Patterns are usually quite heterogeneous and voluminous. So far, little emphasis has been posed on developing an overall integrated environment for uniformly representing and querying different types of patterns. In this paper we consider the larger problem of modeling, storing, and querying patterns, in a database-like setting and use a Pattern-Base Management System (PBMS) for this purpose. Specifically, (a) we formally define the logical foundations for the global setting of pattern management through a model that covers data, patterns, and their intermediate mappings; (b) we present a formalism for pattern specification along with safety restrictions; and (c) we introduce predicates for comparing patterns and query operators.

Added 2008-04-21

Discovering Consensus Patterns in Biological Databases

MY ElTabakh, WG Aref, M Ouzzani, MH Ali

Consensus patterns, like motifs and tandem repeats, are highly conserved patterns with very few substitutions where no gaps are allowed. In this paper, we present a progressive hierarchical clustering technique for discovering consensus patterns in biological databases over a certain length range. This technique can discover consensus patterns with various requirements by applying a post-processing phase. The progressive nature of the hierarchical clustering algorithm makes it scalable and efficient. Experiments to discover motifs and tandem repeats on real biological databases show significant performance gain over non-progressive clustering techniques.

Added 2008-04-21