Reports and Papers Archive - Reports & Papers

Place: A Distributed Spatio-Temporal Data Stream Management System for Moving Objects

X Xiaopeng, HG Elmongui, X Chai, WG Aref

Added 2008-04-22

Detection and tracking of discrete phenomena in sensor-network databases

MH Ali, MF Mokbel, WG Aref, I Kamel

This paper introduces a framework for Phenomena Detection and Tracking (PDT, for short) in sensor network databases. Examples of detectable phenomena include the propagation over time of a pollution cloud or an oil spill region. We provide a crisp definition of a phenomenon that takes into consideration both the strength and the time span of the phenomenon.We focus on discrete phenomena where sensor readings are drawn from a discrete set of values, e.g., item numbers or pollutant IDs, and we point out how our work can be extended to handle continuous phenomena. The challenge for the proposed PDT framework is to detect as much phenomena as possible, given the large number of sensors, the overall high arrival rates of sensor data, and the limited system resources. Our proposed PDT framework uses continuous SQL queries to detect and track phenomena. Execution of these continuous queries is performed in three phases; the joining phase, the candidate selection phase, and the grouping/output phase. The joining phase employs an in-memory multi-way join algorithm that produces a set of sensor pairs with similar readings. The candidate selection phase filters the output of the joining phase to select candidate join pairs, with enough strength and time span, as specified by the phenomenon definition. The grouping/ output phase constructs the overall phenomenon from the candidate join pairs. We introduce two optimizations to increase the likelihood of phenomena detection while using less system resources. Experimental studies illustrate the performance gains of both the proposed PDT framework and the proposed optimizations.

Added 2008-04-22

An extensible index for spatial databases

WG Aref, IF Ilyas

Abstract: Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST, is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST’s tuning parameters.

Added 2008-04-22

Incremental Evaluation of Sliding-Window Queries over Data Streams

TM Ghanem, MA Hammad, MF Mokbel, WG Aref, AK Elmagarmid

Two research efforts have been conducted to realize sliding-window queries in data stream management systems, namely, query reevaluation and incremental evaluation. In the query reevaluation method, two consecutive windows are processed independently of each other. On the other hand, in the incremental evaluation method, the query answer for a window is obtained incrementally from the answer of the preceding window. In this paper, we focus on the incremental evaluation method. Two approaches have been adopted for the incremental evaluation of sliding-window queries, namely, the input-triggered approach and the negative tuples approach. In the input-triggered approach, only the newly inserted tuples flow in the query pipeline and tuple expiration is based on the timestamps of the newly inserted tuples. On the other hand, in the negative tuples approach, tuple expiration is separated from tuple insertion where a tuple flows in the pipeline for every inserted or expired tuple. The negative tuples approach avoids the unpredictable output delays that result from the input-triggered approach. However, negative tuples double the number of tuples through the query pipeline, thus reducing the pipeline bandwidth. Based on a detailed study of the incremental evaluation pipeline, we classify the incremental query operators into two classes according to whether an operator can avoid the processing of negative tuples or not. Based on this classification, we present several optimization techniques over the negative tuples approach that aim to reduce the overhead of processing negative tuples while avoiding the output delay of the query answer. A detailed experimental study, based on a prototype system implementation, shows the performance gains over the input-triggered approach of the negative tuples approach when accompanied with the proposed optimizations.

Added 2008-04-22

R-trees with Update Memos

X Xiong, WG Aref

The problem of frequently updating multi-dimensional indexes arises in many location-dependent applications. While the R-tree and its variants are one of the dominant choices for indexing multi-dimensional objects, the R-tree exhibits inferior performance in the presence of frequent updates. In this paper, we present an R-tree variant, termed the RUM-tree (stands for R-tree with Update Memo) that minimizes the cost of object updates. The RUM-tree processes updates in a memo-based approach that avoids disk accesses for purging old entries during an update process. Therefore, the cost of an update operation in the RUM-tree reduces to the cost of only an insert operation. The removal of old object entries is carried out by a garbage cleaner inside the RUM-tree. In this paper, we present the details of the RUM-tree and study its properties. Theoretical analysis and experimental evaluation demonstrate that the RUMtree outperforms other R-tree variants by up to a factor of eight in scenarios with frequent updates.

Added 2008-04-22

Continuous Query Processing of Spatio-Temporal Data Streams in PLACE

MF Mokbel, X Xiong, MA Hammad, WG Aref

The tremendous increase in the use of cellular phones, GPS-like devices, and RFIDs results in highly dynamic environments where objects as well as queries are continuously moving. In this paper, we present a continuous query processor designed specifically for highly dynamic environments (e.g., location-aware environments). We implemented the proposed continuous query processor inside the PLACE server (Pervasive Location-Aware Computing Environments); a scalable location-aware database server developed at Purdue University. The PLACE server extends data streaming management systems to support location-aware environments. These environments are characterized by the wide variety of continuous spatio-temporal queries and the unbounded spatio-temporal streams. The proposed continuous query processor includes: (1) New incremental spatio-temporal operators to support a wide variety of continuous spatio-temporal queries, (2) Extended semantics of sliding window queries to deal with spatial sliding windows as well as temporal sliding windows, and (3) A shared-execution framework for scalable execution of a set of concurrent continuous spatio-temporal queries. Experimental evaluation shows promising performance of the continuous query processor of the PLACE server.

Added 2008-04-22

Disk Scheduling in Video Editing Systems

WG Aref, I Kamel, S Ghandeharizadeh

Modern video servers support both video-on-demand and nonlinear editing applications. Video-on-demand servers enable the user to view video clips or movies from a video database, while nonlinear editing systems enable the user to manipulate the content of the video database. Applications such as video and news editing systems require that the underlying storage server be able to concurrently record live broadcast information, modify prerecorded data, and broadcast an authored presentation. A multimedia storage server that efficiently supports such a diverse group of activities constitutes the focus of this study. A novel real-time disk scheduling algorithm is presented that treats both read and write requests in a homogeneous manner in order to ensure that their deadlines are met. Due to real-time demands of movie viewing, read requests have to be fulfilled within certain deadlines; otherwise, they are considered lost. Since the data to be written into disk is stored in main memory buffers, write requests can be postponed until critical read requests are processed. However, write requests still have to be processed within reasonable delays and without the possibility of indefinite postponement. This is due to the physical constraint of the limited size of the main memory write buffers. The new algorithm schedules both read and write requests appropriately, to minimize the amount of disk reads that do not meet their presentation deadlines, and to avoid indefinite postponement and large buffer sizes in the case of disk writes. Simulation results demonstrate that the proposed algorithm offers low violations of read deadlines, reduces waiting time for lower priority disk requests, and improves the throughput of the storage server by enhancing the utilization of available disk bandwidth.

Added 2008-04-22

Smart VideoText: a video data model based on conceptual graphs

F Kokkoras, H Jiang, I Vlahavas, AK Elmagarmid, EN Houstis, WG Aref

An intelligent annotation-based video data model called Smart VideoText is introduced. It utilizes the conceptual graph knowledge representation formalism to capture the semantic associations among the concepts described in text annotations of video data. The aim is to achieve more effective query, retrieval, and browsing capabilities based on the semantic content of video data. Finally, a generic and modular video database architecture based on the Smart VideoText data model is described.

Added 2008-04-22

Scalable QoS-aware disk-scheduling

WG Aref, K El-Bassyouni, I Kamel, MF Mokbel

A new quality of service (QoS) aware disk scheduling algorithm is presented. It is applicable in environments where data requests arrive with different QoS requirements such as real-time deadline, and user priority. Previous work on disk scheduling has focused on optimizing the seek times and/or meeting the real-time deadlines. A unified framework for QoS disk scheduling is presented that scales with the number of scheduling parameters. The general idea is based on modeling the disk scheduler requests as points in the multi-dimensional space, where each of the dimensions represents one of the parameters (e.g., one dimension represents the request deadline, another represents the disk cylinder number and a third dimension represents the priority of the request, etc.). Then the disk scheduling problem reduces to the problem of finding a linear order to traverse these multi-dimensional points. Space-filling curves are adopted to define a linear order for sorting and scheduling objects that lie in the multi-dimensional space. This generalizes the one-dimensional disk scheduling algorithms (e.g., EDF SATF, FIFO). Several techniques are presented to show how a QoS-aware disk scheduler deals with the progressive arrival of requests over time. Simulation experiments are presented to show a comparison of the alternative techniques and to demonstrate the scalability of the proposed QoS-aware disk scheduling algorithm over other traditional approaches.

Added 2008-04-22

Spectral LPM: an optimal locality-preserving mapping using the spectral (not fractal) order

MF Mokbel, WG Aref, A Grama

For the past two decades, fractals (e.g., the Hilbert and Peano space-filling curves) have been considered the natural method for providing a locality-preserving mapping. The idea behind a locality-preserving mapping is to map points that are nearby in the multidimensional space into points that are nearby in the one-dimensional space. We argue against the use of fractals in locality-preserving mapping algorithms, and present examples with experimental evidence to show why fractals produce poor locality-preserving mappings. In addition, we propose an optimal locality-preserving mapping algorithm, termed the spectral locality-preserving mapping algorithm (Spectral LPM, for short), that makes use of the spectrum of the multidimensional space. We give a mathematical proof for the optimality of Spectral LPM, and also demonstrate its practical use.

Added 2008-04-22

Location-Aware Query Processing and Optimization

MF Mokbel, WG Aref

Added 2008-04-22

The New Casper: A Privacy-Aware Location-Based Database Server

MF Mokbel, CY Chow, WG Aref

This demo presents Casper; a framework in which users entertain anonymous location-based services. Casper consists of two main components; the location anonymizer that blurs the users exact location into cloaked spatial regions and the privacy-aware query processor that is responsible on providing location-based services based on the cloaked spatial regions. While the location anonymizer is implemented as a stand alone application, the privacy-aware query processor is embedded into PLACE; a research prototype for location-based database servers.

Added 2008-04-22

Phenomenon-Aware Stream Query Processing

MH Ali, MF Mokbel, WG Aref

Spatio-temporal data streams that are generated from mobile stream sources (e.g., mobile sensors) experience similar environmental conditions that result in distinct phenomena. Several research efforts are dedicated to detect and track various phenomena inside a data stream management system (DSMS). In this paper, we use the detected phenomena to reduce the demand on the DSMS resources. The main idea is to let the query processor observe the input data streams at the phenomena level. Then, each incoming continuous query is directed only to those phenomena that participate in the query answer. Two levels of indexing are employed, a phenomenon index and a query index. The phenomenon index provides a fine resolution view of the input streams that participate in a particular phenomenon. The query index utilizes the phenomenon index to maintain a query deployment map in which each input stream is aware of the set of continuous queries that the stream contributes to their answers. Both indices are updated dynamically in response to the evolving nature of phenomena and to the mobility of the stream sources. Experimental results show the efficiency of this approach with respect to the accuracy of the query result and the resource utilization of the DSMS

Added 2008-04-22

Automatic image segmentation by integrating color-edge extractionand seeded region growing

J Fan, DKY Yau, AK Elmagarmid, WG Aref

We propose a new automatic image segmentation method. Color edges in an image are first obtained automatically by combining an improved isotropic edge detector and a fast entropic thresholding technique. After the obtained color edges have provided the major geometric structures in an image, the centroids between these adjacent edge regions are taken as the initial seeds for seeded region growing (SRG). These seeds are then replaced by the centroids of the generated homogeneous image regions by incorporating the required additional pixels step by step. Moreover, the results of color-edge extraction and SRG are integrated to provide homogeneous image regions with accurate and closed boundaries. We also discuss the application of our image segmentation method to automatic face detection. Furthermore, semantic human objects are generated by a seeded region aggregation procedure which takes the detected faces as object seeds

Added 2008-04-22

Exploiting time-varying relationships in statistical relational models

Umang Sharan, Jennifer Neville

In a growing number of relational domains, the data record temporal sequences of interactions among entities. For example, in citation domains authors publish scientific papers together each year and in telephone fraud detection domains people make calls to each other each day. The temporal dynamics of these interactions contain information that can improve predictive models (e.g., people publishing together frequently are likely to be publishing on the same topic) but to date there has been little effort to incorporate timevarying dependencies into relational models. Past work in relational learning has focused primarily on static “snapshots” of relational data. In this paper, we present an initial approach to modeling dynamic relational data graphs in predictive models of attributes. More specifically, we use a two-step process that first summarizes the dynamic graph with a weighted static graph and then incorporates the link weights in a relational Bayes classifier. We evaluate our approach on the Cora dataset (where co-author and citation links vary over time) showing that our approach results in significant performance gains over a baseline snapshot approach that ignores the temporal component of the data.

Added 2008-04-22