The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Hidden Pattern Statistics

Download

Download PDF Document
PDF

Author

Philippe Flajolet, Yves Guivarc'h, Wojciech Szpankowski, and Brigitte Vallee

Tech report number

CERIAS TR 2002-06

Entry type

article

Abstract

Two fundamental problems in combinatorics on words and string manipulation are string matching and sequence comparison. In string matching one searches for all occurrences of a given string, understood as a sequence of consecutive symbols, in a text. In sequence comparison a subsequence rather than a string is searched in a text. The string-matching problem has been extensively studied in literature from algorithmic and probabilistic points of view. The sequence comparison problem, also known as hidden pattern problem, is harder and it has been much less investigated. In this paper we study the number of occurrences of a given pattern w of length m as a subsequence in a random text of length n generated by a memoryless source. In particular, we consider two versions of this problem, namely the unconstrained one in which the subsequence w can appear anywhere in the text, and the constrained one that puts bounds on the distances between symbols of the word w. We determine the mean and the variance of the number of occurrences, and establish a Gaussian limit law. These results are obtained via combinatorics on words, formal languages, and methods of analytic combinatorics based on generating functions and moment methods. The motivation to study this problem comes from an attempt at finding a reliable threshold for intrusion detections, from textual data processing applications, and from molecular biology.

Download

PDF

Date

2001

Journal

LNCS 2076

Key alpha

Flajolet

Pages

152-165

Publication Date

1900-01-01

BibTex-formatted data

To refer to this entry, you may select and copy the text below and paste it into your BibTex document. Note that the text may not contain all macros that BibTex supports.