Reports and Papers Archive - Reports & Papers

CERIAS Security Vision Roundtable Call to Action

Andersen Consulting & CERIAS

Added 2002-07-26

Ambiguity of Ultrashort Pulse Shapes Retrieved from the Intensity Autocorrelation and the Power Spectrum

CERIAS TR 2002-01

Jung-Ho Chung & Andrew M. Weiner

Download: PDF

We construct several examples of distinct asymmetric-symmetric pulse pairs with identical or essentially identical intensity autocorrelations and power spectra. From these examples we infer that pulse retrieval methods based on these two data sets alone produce ambiguous solutions. Furthermore, we used the constructed pulse pairs as test cases to assess the degree of difference in the corresponding interferometric autocorrelations. In several cases we found that the differences in the interferometric autocorrelations are sufficiently small that they might be quite challenging to distinguish in a practical experimental context.

Added 2002-07-26

A Note on the Asymptotic Behavior of the Height in b-Tries for b Large

CERIAS TR 2002-04

Charles Knessl, Wojciech Szpankowski

Download: PDF

We study the limiting distribution of the height in a generalized trie in which external nodes are capable to store up to b items (the so called b-tries). We assume that such a tree is build from n random strings (items) generated by an unbiased memoryless source. In this paper, we discuss the case when b and n are both large. We shall identify six natural regions of the height distribution that should be compared to three regions obtained for fixed b. We prove that for most n, the limiting distribution is concentrated at the single point k1 = [log2 (n/b)] + 1 as n,b approach infinity. We observe that this is quite different than the height distribution for fixed b, in which case the limiting distribution is of an extreme value type concentrated arount (1 + 1/b)log2 n. We derive our results by analytic methods, namely generating functions and the saddle point method. We also present some numerical verification of our results.

Added 2002-07-26

Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source

CERIAS TR 2002-05

Philippe Jacquet, Wojciech Sqpankowski, Jing Tang

Download: PDF

For a Markovian source, we analyze the Lempel-Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (I - 1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, called the Gilbert-Kadota model, a fixed number of phrases is generated according to the Lempel-Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel-Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching and pattern matching. In this paper, we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution, which leads to bounds on the average redundancy of the Lempel-Ziv code. For Markov Independent model, this finding is established by analytic methods (i.e., generating functions, Mellin transform and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses.

Added 2002-07-26

Hidden Pattern Statistics

CERIAS TR 2002-06

Philippe Flajolet, Yves Guivarc'h, Wojciech Szpankowski, and Brigitte Vallee

Download: PDF

Two fundamental problems in combinatorics on words and string manipulation are string matching and sequence comparison. In string matching one searches for all occurrences of a given string, understood as a sequence of consecutive symbols, in a text. In sequence comparison a subsequence rather than a string is searched in a text. The string-matching problem has been extensively studied in literature from algorithmic and probabilistic points of view. The sequence comparison problem, also known as hidden pattern problem, is harder and it has been much less investigated. In this paper we study the number of occurrences of a given pattern w of length m as a subsequence in a random text of length n generated by a memoryless source. In particular, we consider two versions of this problem, namely the unconstrained one in which the subsequence w can appear anywhere in the text, and the constrained one that puts bounds on the distances between symbols of the word w. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law. These results are obtained via combinatorics on words, formal languages, and methods of analytic combinatorics based on generating functions and moment methods. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

Added 2002-07-26

Trustworthiness Based Authorization on WWW

CERIAS TR 2002-08

Yuhui Zhong, Bharat Bhargava, and Malika Mahoui

Download: PDF

Current approaches for authorization on Web servers are mostly based on a predefined set of users or domains. They are not suitable for Internet Web sites where the user set is unbounded and authorized users can be non-predefined. We propose an authorization approach that applies role-based access control (RBAC) to WWW. Under this approach, system administrators predefine roles, role-permission relations, and the policies that assign roles to users (user-role assignment policy). The system automatically collects trustworthy information (valid evidence) and assigns roles to Internet users according to user-role assignment policies. Trustworthiness information plays an important role in user-role assignment. The validity of evidence is assessed based on the trustworthiness information of the evidence provider. In addition, system administrators can specify the trustworthiness constraints that users have to satisfy for holding roles. In this paper, the schema of using RBAC on the Web and the procedure of user-role assignment are presented. The classification and evaluation of trustworthiness are discussed.

Added 2002-07-26

Separating Between Trust and Access Control Policies: A necessity for Web Applications

CERIAS TR 2002-07

Malika Mahoui, Bharat Bhargava, and Yuhui Zhong

Download: PDF

As Security is the key of success for Web Applications most of the efforts that have been put in this domain have focused on wining users

Added 2002-07-26

The Height of a Binary Search Tree: The Limiting Distribution Perspective

CERIAS TR 2002-09

Charles Knessl, Wojciech Szpankowski

Download: PDF

We study the height of the binary search tree - the most fundamental data structure used for searching. We assume that the binary search tree is built from a random permutation of n elements. Under this assumption, we study the limiting distribution of the height as n approaches infinity. We show that the distribution has six asymptotic regions (scales). These correspond to different ranges of k and n where Pr{Hn <= k} is the height distribution. In the critical region (the so-called central region), where most of the probability mass is concentrated, the limiting distribution satisfies a non-linear integral equation. While we cannot solve this equation exactly, we show that both tails of the distribution are roughly of a double exponential form. From our analysis we conclude that the average height E[Hn] ~ A log n

Added 2002-07-26

The CROSS/Linux Value-added Services Router

Prem Gopalan

Added 2002-07-26

A Universal Predictor Based on Pattern Matching

CERIAS TR 2002-10

Philippe Jacquet, Wojciech Sqpankowski, Izydor Apostol

Download: PDF

We consider a universal predictor based on pattern matching: Given a sequence X1

Added 2002-07-26

Multicast Tree Structure and the Power Law

CERIAS TR 2002-11

Cedric Adjih, Philippe Jacquet, Leonidas Georgiadia and Wojciech Szpankowski

Download: PDF

One of the main benefits of multicast communication is the overall reduction of network load. To quantify this reduction, when compared to traditional unicast, experimental studies by Chuang and Sirbu indicated the so-called power law which asserts that the ratio R(m) of the average number of links in a multicast delivery tree connecting a source to m (distinct) sites to the average number of links in a unicast path, satisfies R(m) ~ cm^0.8 where c is a constant. In order to explain theoretically this behavior, Phillips, Shenker, and Tangmunarunkit examined approximately R(m) for a V -ary complete tree topology, and concluded that R(m) grows nearly linearly with m, thus not obeying the power law. We first re-examine the analysis by Phillips et.al. and provide precise asymptotic expansion for R(m) that confirms the nearly linear (with some wobbling) growth. Claiming that the essence of the problem lies in the modeling assumptions, we replace the V -ary complete tree topology by a V -ary self-similar tree with similarity factor 0 < T <1. In such a tree a node at level k is replicated CV^(D_ktT) times, where D is the depth of the tree and C is a constant. Under this assumption, we analyze again R(m) and prove that R(m) ~ cm^(1_T) where c is an explicitly computable constant. Hence self-similar trees provide a plausible explanation of the multicast power law. Next, we examine more general conditions for general trees, under which the power law still holds. We also discuss some experimental results in real networks that reaffirm the power law and show that in these networks the general conditions hold. In particular, our experiments show that for the tested networks T ~ 0.12.

Added 2002-07-26

1999 CSI/FBI Computer Crime and Security Survey

Richard Power

Added 2002-07-26

A High Assurance Multilevel File Server for Off-The-Shelf Workstation Applications and Secure Messaging

James P. Anderson, Cynthia E. Irvine

Added 2002-07-26

2000 IEEE Symposium on Security and Privacy

CERIAS TR 2003-57

IEEE Computer Society

Download: PDF

Added 2002-07-26

Generalized Shannon Code Minimizes the Maximal Redundancy

CERIAS TR 2002-12

Michael Drmota and Wojciech Szpankowski

Download: PDF

Source coding, also known as data compression, is an area of information theory that deals with the design and performance evaluation of optimal codes for data compression. In 1952 Huffman constructed his optimal code that minimizes the average code length among all prefix codes for known sources. Actually, Huffman codes minimize the average redundancy defined as the difference between the code length and the entropy of the source. Interestingly enough, no optimal code is known for other popular optimization criterion such asthemaximal redundancy defined as the maximum of the point-wise redundancy over all source sequences. We first prove that a generalized Shannon code minimizes the maximal redundancy among all prefix codes, and present an efficient implementation of the optimal code. Then we compute precisely its redundancy for memory less sources. Finally, we study universal codes for unknown source distributions. We adopt the minimax approach and search for the best code for the worst source. We establish that such redundancy is a sum of the likelihood estimator and the redundancy of the generalize code computed for the maximum likelihood distribution. This replaces Shtarkov\‘s bound by an exact formula. We also compute precisely the maximal minimax for a class of memory less sources. The main findings of this paper are established by techniques that belong to the toolkit of the

Added 2002-07-26