The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Average Profile of the Lempel-Ziv Parsing Scheme for a Markovian Source

Download

Download PDF Document
PDF

Author

Philippe Jacquet, Wojciech Sqpankowski, Jing Tang

Tech report number

CERIAS TR 2002-05

Entry type

article

Abstract

For a Markovian source, we analyze the Lempel-Ziv parsing scheme that partitions sequences into phrases such that a new phrase is the shortest phrase not seen in the past. We consider three models: In the Markov Independent model, several sequences are generated independently by Markovian sources, and the ith phrase is the shortest prefix of the ith sequence that was not seen before as a phrase (i.e., a prefix of previous (I - 1) sequences). In the other two models, only a single sequence is generated by a Markovian source. In the second model, called the Gilbert-Kadota model, a fixed number of phrases is generated according to the Lempel-Ziv algorithm, thus producing a sequence of a variable (random) length. In the last model, known also as the Lempel-Ziv model, a string of fixed length is partitioned into a variable (random) number of phrases. These three models can be efficiently represented and analyzed by digital search trees that are of interest to other algorithms such as sorting, searching and pattern matching. In this paper, we concentrate on analyzing the average profile (i.e., the average number of phrases of a given length), the typical phrase length, and the length of the last phrase. We obtain asymptotic expansions for the mean and the variance of the phrase length, and we prove that appropriately normalized phrase length in all three models tends to the standard normal distribution, which leads to bounds on the average redundancy of the Lempel-Ziv code. For Markov Independent model, this finding is established by analytic methods (i.e., generating functions, Mellin transform and depoissonization), while for the other two models we use a combination of analytic and probabilistic analyses.

Download

PDF

Date

2001

Institution

Purdue Univeristy

Journal

Algorithmica

Key alpha

Jacquet

Pages

318-360

Volume

31

Publication Date

1900-01-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.