Algebraic Techniques for Analysis of Large Discrete-Valued Datasets
Author
Mehmet Koyuturk, Ananth Grama, Naren Ramakrishnan
Entry type
article
Abstract
With the availability of large scale computing platforms and instrumentation for data gathering, increased emphasis is being placed on efficient techniques for analyzing large and extremely high-dimensional datasets. In this paper, we present a novel algebraic technique based on a variant of semi-discrete matrix decomposition (SDD), which is capable of compressing large discrete-valued datasets in an error bounded fashion. We show that this process of compression can be thought of as identifying dominant patterns in underlying data. We derive efficient algorithms for computing dominant patterns, quantify their performance analytically as well as experimentally, and identify applications of these algorithms in problems ranging from clustering to vector quantization.We demonstrate the superior characteristics of our algorithm in terms of (i) scalability to extremely high dimensions; (ii) bounded error; and (iii) hierarchical nature, which enables multiresolution analysis. Detailed experimental results are provided to support these claims.
Date
2002
Booktitle
Principles of Data Mining and Knowledge Discovery
Key alpha
Grama
Pages
345-360
Publisher
Springer Berlin / Heidelberg
Series
Lecture Notes in Computer Science
Volume
2431
Affiliation
Purdue University
Publication Date
2002-00-00
Copyright
2002

