Reports and Papers Archive - Reports & Papers

GPAC: generic and progressive processing of mobile queries over mobile data

MF Mokbel, WG Aref

Download: PDF

This paper introduces a new family of Generic and Progressive algorithms (GPAC, for short) for continuous mobile queries over mobile objects. GPAC provides a general skeleton that can be tuned through a set of methods to behave as various continuous queries (e.g., continuous range queries and continuous k-nearest-neighbor queries). GPAC algorithms aim to provide three goals: (1) Online evaluation through an in-memory processing of the incoming mobile data. (2) Progressive evaluation through employing an incremental evaluation paradigm. (3) Fast query response through employing an anticipation paradigm. Query answer is anticipated and is cached in memory to allow for fast evaluation. GPAC algorithms are encapsulated in physical pipelined query operators. GPAC pipelined operators can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries. Experimental results based on a real implementation inside a prototype streaming database engine show the efficiency of GPAC operators in providing incremental and fast response for continuous queries.

Added 2008-04-08

Pipelined spatial join processing for quadtree-based indexes

W Aref

Download: PDF

Spatial join is an important yet costly operation in spatial databases. In order to speed up the execution of a spatial join, the input tables are often indexed based on their spatial attributes. The quadtree index structure is a well-known index for organizing spatial database objects. It has been implemented in several database management systems, e.g., in Oracle Spatial and in PostgreSQL (via SP-GiST). Queries typically involve multiple pipelined spatial join operators that fit together in a query evaluation plan. In order to extend the applicability of these spatial joins, they are optimized so that upon receiving sorted input, they produce sorted output for the spatial join operators in the upperlevels of the query evaluation pipeline. This paper investigates the use of quadtree-based spatial join algorithms and how they can be adapted to answer queries that involve multiple pipelined spatial joins in a query evaluation plan. The paper investigates several adaptations to pipelined spatial join algorithms and their performance for the cases when both input tables are indexed, when only one of the tables is indexed while the second table is sorted, and when both tables are sorted but are not indexed.

Added 2008-04-08

Challenges in spatiotemporal stream query optimization

HG Elmongui, M Ouzzani, WG Aref

Download: PDF

Simplified technology and low costs have spurred the use of location-detection devices in moving objects. Usually, these devices will send the moving objects’ location information to a spatio-temporal data stream management system, which will be then responsible for answering spatio-temporal queries related to these moving objects. A large spectrum of research have been devoted to continuous spatio-temporal query processing. However, we argue that several outstanding challenges have been either addressed partially or not at all in the existing literature. In particular, in this paper, we focus on the optimization of multi-predicate spatio-temporal queries on moving objects. We present several major challenges related to the lack of spatio-temporal pipelined operators, and the impact of time, space, and their combination on the query plan optimality under different circumstances mof query and object distributions. We show that building an adaptive query optimization framework is key in addressing these challenges and coping with the dynamic nature of the environment we are evolving in.

Added 2008-04-08

Irregularity in multi-dimensional space-filling curves with applications in multimedia databases

MF Mokbel, WG Aref

Download: PDF

A space-filling curve is a way of mapping the multi-dimensional space into the one-dimensional space. It acts like a thread that passes through every cell element (or pixel) in the N-dimensional space so that every cell is visited at least once. Thus, a space-filling curve imposes a linear order of the cells in the N-dimensional space. There are numerous kinds of space-filling curves. The difference between such curves is in their way of mapping to the one-dimensional space. Selecting the appropriate curve for any application requires a brief knowledge of the mapping scheme provided by each space-filling curve. Irregularity is proposed as a quantitative measure of the quality of the mapping of the space-filling curve. Closed formulas are developed to compute the irregularity for any general dimension D with N points in each dimension for different space-filling curves.A comparative study of different space-filling curves with respect to irregularity is conducted and results are presented and discussed. The applicability of this research is the area of multimedia databases is illustrated with a discussion of the problems that arise.

Added 2008-04-08

Trojan Horse Resistant Discretionary Access Control

CERIAS TR 2008-8

Ziqing Mao, Ninghui Li, Hong Chen, Xuxian Jiang

Download: PDF

Modern operating systems primarily use Discretionary Access Control (DAC) to protect files and other operating system resources. DAC mechanisms are more user-friendly than Mandatory Access Control (MAC) systems, but are vulnerable to trojan horse attacks and attacks exploiting buggy software. We show that it is possible to have the best of both worlds: DAC’s easy-to-use discretionary policy specification and MAC’s defense against trojan horses and buggy programs. This is made possible by a key new insight that DAC has this weakness not because it uses the discretionary principle, but because existing DAC enforcement mechanisms assume that a single principal is responsible for any request, whereas in reality a request may be influenced by multiple principals; thus these mechanisms cannot correctly identify the true origin(s) of a request and fall prey to trojan horses. We propose to solve this problem by combining DAC’s policy specification with new enforcement techniques that use ideas from MAC’s information flow tracking. Our model, called Information Flow Enhanced Discretionary Access Control (IFEDAC), is the first DAC model that can defend against trojan horses and attacks exploiting buggy software. IFEDAC significantly strengthens end host security, while preserving to a large degree DAC’s ease of use. In this paper, we present the IFEDAC model, analyze its security properties, and discuss our design and implementation for Linux.

Added 2008-04-08

Rank-aware query optimization

IF Ilyas, R Shah, WG Aref, JS Vitter, AK Elmagarmid

Download: PDF

Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The new operators have a direct impact on traditional query processing and optimization.We introduce a rank-aware query optimization framework that fully integrates rank-join operators into relational query engines. The framework is based on extending the System R dynamic programming algorithm in both enumeration and pruning. We define ranking as an interesting property that triggers the generation of rank-aware query plans. Unlike traditional join operators, optimizing for rank-join operators depends on estimating the input cardinality of these operators. We introduce a probabilistic model for estimating the input cardinality, and hence the cost of a rank-join operator. To our knowledge, this paper is the first effort in estimating the needed input size for optimal rank aggregation algorithms. Costing ranking plans, although challenging, is key to the full integration of rank-join operators in real-world query processing engines. We experimentally evaluate our framework by modifying the query optimizer of an open-source database management system. The experiments show the validity of our framework and the accuracy of the proposed estimation model.

Added 2008-04-08

LUGrid: Update-tolerant Grid-based Indexing for Moving Objects

X Xiong, MF Mokbel, WG Aref

Indexing moving objects is a fundamental issue in spatiotemporal databases. In this paper, we propose an adaptive Lazy-Update Grid-based index (LUGrid, for short) that minimizes the cost of object updates. LUGrid is designed with two important features, namely, lazy insertion and lazy deletion. Lazy insertion reduces the update I/Os by adding an additional memory-resident layer over the disk index. Lazy deletion reduces update cost by avoiding deleting single obsolete entry immediately. Instead, the obsolete entries are removed later by specially designed mechanisms. LUGrid adapts to object distributions through cell splitting and merging. Theoretical analysis and experimental results indicate that LUGrid outperforms former work by up to eight times when processing intensive updates, while yielding similar search performance.

Added 2008-04-08

The new Casper: query processing for location services without compromising privacy

MF Mokbel, Chi-Yin Chow, WG Aref

Download: PDF

This paper tackles a major privacy concern in current location-based services where users have to continuously report their locations to the database server in order to obtain the service. For example, a user asking about the nearest gas station has to report her exact location. With untrusted servers, reporting the location information may lead to several privacy threats. In this paper, we present Casper1; a new framework in which mobile and stationary users can entertain location-based services without revealing their location information. Casper consists of two main components, the location anonymizer and the privacy-aware query processor. The location anonymizer blurs the users’ exact location information into cloaked spatial regions based on user-specified privacy requirements. The privacy-aware query processor is embedded inside the location-based database server in order to deal with the cloaked spatial areas rather than the exact location information. Experimental results show that Casper achieves high quality location-based services while providing anonymity for both data and queries.

Added 2008-04-08

Joining ranked inputs in practice

IF Ilyas, WG Aref, AK Elmagarmid

Download: PDF

Joining ranked inputs is an essential requirement for many database applications, such as ranking search results from multiple search engines and answering multi-feature queries for multimedia retrieval systems. We introduce a new practical pipelined query operator, termed NRA-RJ, that produces a global rank from input ranked streams based on a score function. The output of NRA-RJ can serve as a valid input to other NRA-RJ operators in the query pipeline. Hence, the NRA-RJ operator can support a hierarchy of join operations and can be easily integrated in query processing engines of commercial database systems. The NRA-RJ operator bridges Fagin’s optimal aggregation algorithm into a practical implementation and contains several optimizations that address performance issues. We compare the performance of NRA-RJ against recent rank join algorithms. Experimental results demonstrate the performance trade-offs among these algorithms. The experimental results are based on an empirical study applied to a medical video application on top of a prototype database system. The study reveals important design options and shows that the NRA-RJ operator outperforms other pipelined rank join operators when the join condition is an equi-join on key attributes.

Added 2008-04-08

Query Processing in Broadcasted Spatial Index Trees

S Hambrusch, Chuan-Ming Liu, WG Aref, S Prabhakar

The broadcasting of spatial data together with an index structure is an effective way of disseminating data in a wireless mobile environment. Mobile clients requesting data tune into a continuous broadcast only when spatial data of interest and relevance is available on the channel and thus minimize their power consumption. A mobile client experiences latency (time elapsed from requesting to receiving data) and tuning time (the amount of time spent listening to the channel). This paper studies the execution of spatial queries on broadcasted tree-based spatial index structures. The focus is on queries that require a partial traversal of the spatial index, not only a single-path root-to-leaf search. We present techniques for processing spatial queries while mobile clients are listening to a broadcast of the tree. Our algorithms can handle clients with limited memory, trees broadcast with a certain degree of replication of index nodes, and algorithms executed at the clients may employ different data structures. Experimental work on R*-trees shows that these techniques lead to different tuning times and different latencies. Our solutions also lead to efficient methods for starting the execution of a query in the middle of a broadcast cycle. Spatial query processing in a multiple channel environment is also addressed.

Added 2008-04-08

SEA-CNN: scalable processing of continuous k-nearest neighbor queries in spatio-temporal databases

X Xiong, MF Mokbel, WG Aref

Download: PDF

Location-aware environments are characterized by a large number of objects and a large number of continuous queries. Both the objects and continuous queries may change their locations over time. In this paper, we focus on continuous k-nearest neighbor queries (CKNN, for short). We present a new algorithm, termed SEA-CNN, for answering continuously a collection of concurrent CKNN queries. SEA-CNN has two important features: incremental evaluation and shared execution. SEA-CNN achieves both efficiency and scalability in the presence of a set of concurrent queries. Furthermore, SEA-CNN does not make any assumptions about the movement of objects, e.g., the objects velocities and shapes of trajectories, or about the mutability of the objects and/or the queries, i.e., moving or stationary queries issued on moving or stationary objects. We provide theoretical analysis of SEA-CNN with respect to the execution costs, memory requirements and effects of tunable parameters. Comprehensive experimentation shows that SEA-CNN is highly scalable and is more efficient in terms of both I/O and CPU costs in comparison to other R-tree-based CKNN techniques.

Added 2008-04-08

Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

MF Mokbel, M Lu, WG Aref

This paper introduces the hash-merge join algorithm(HMJ, for short); a new non-blocking join algorithm thatdeals with data items from remote sources via unpredictable,slow, or bursty network traffic. The HMJ algorithmis designed with two goals in mind: (1) Minimize thetime to produce the first few results, and (2) Produce joinresults even if the two sources of the join operator occasionallyget blocked. The HMJ algorithm has two phases: Thehashing phase and the merging phase. The hashing phaseemploys an in-memory hash-based join algorithm that producesjoin results as quickly as data arrives. The mergingphase is responsible for producing join results if the twosources are blocked. Both phases of the HMJ algorithmare connected via a flushing policy that flushes in-memoryparts into disk storage once the memory is exhausted. Experimentalresults show that HMJ combines the advantagesof two state-of-the-art non-blocking join algorithms (XJoinand Progressive Merge Join) while avoiding their short-comings.

Added 2008-04-08

Hierarchical video summarization for medical data

X Zhu, J Fan, AK Elmagarmid, WG Aref

Download: PDF

To provide users with an overview of medical video content at various levels of abstraction which can be used for more efficient database browsing and access, a hierarchical video summarization strategy has been developed and is presented in this paper. To generate an overview, the key frames of a video are preprocessed to extract special frames (black frames, slides, clip art, sketch drawings) and special regions (faces, skin or blood-red areas). A shot grouping method is then applied to merge the spatially or temporally related shots into groups. The visual features and knowledge from the video shots are integrated to assign the groups into predefined semantic categories. Based on the video groups and their semantic categories, video summaries for different levels are constructed by group merging, hierarchical group clustering and semantic category selection. Based on this strategy, a user can select the layer of the summary to access. The higher the layer, the more concise the video summary; the lower the layer, the greater the detail contained in the summary.

Added 2008-04-08

Privacy-preserving data integration and sharing

Chris Clifton, Murat KantarcioÇ§lu, AnHai Doan, Gunther Schadow, Jaideep Vaidya, Ahmed Elmagarmid, Dan Suciu

Added 2008-04-08

Rank-aware query optimization

Ihab F. Ilyas, Rahul Shah, Walid G. Aref, Jeffrey Scott Vitter, Ahmed K. Elmagarmid

Added 2008-04-08