Abstract
This paper introduces a novel method of rights protection for
categorical data through watermarking.
We discover new watermark embedding channels for relational data with
categorical types. We design novel watermark encoding algorithms and
analyze important theoretical bounds including mark vulnerability.
While fully preserving data quality requirements, our solution survives
important attacks, such as subset selection and data re-sorting. Mark
detection is fully "blind" in that it doesn't require the original data,
an important characteristic especially in the case of massive data.
We propose various improvements and alternative encoding methods.
We perform validation experiments by watermarking the outsourced Wal-Mart
sales data available at our institute. We prove (experimentally and by
analysis) our solution to be extremely resilient to both alteration and
data loss attacks, for example tolerating up to 80% data loss with a
watermark alteration of only 25%.