There has been concern over the apparent conflict between privacy and data mining. There is no inherent conflict, as most types of data mining produce summary results that do not reveal information about individuals. The process of data mining may use private data, leading to the potential for privacy breaches. Secure Multiparty Computation shows that results can be produced without revealing the data used to generate them. The problem is that general techniques for secure multiparty computation do not scale to data-mining size computations. This paper presents an efficient protocol for securely determining the size of set intersection, and shows how this can be used to generate association rules where multiple parties have different (and private) information about the same set of individuals.
Watermarking is a frequently used tool for digital rights management. An example of this is using watermarks to place ownership information into an object. There are many instances where placing multiple watermarks into the same object is desired. One mechanism that has been proposed for doing this is segmenting the data into a grid and placing watermarks into different regions of the grid. This is particularly suited for images and geographic information systems (GIS) databases as they already consist of a fine granularity grid (of pixels, geographic regions, etc.); a grid cell for watermarking is an aggregation of the original fine granularity cells. An attacker may be interested in only a subset of the watermarked data, and it is crucial that the watermarks survive in the subset selected by the attacker. In the kind of data mentioned above (images, GIS, etc.) such an attack typically consists of cropping, e.g. selecting a geographic region between two latitudes and longitudes (in the GIS case) or a rectangular region of pixels (in an image). The contribution of this paper is a set of schemes and their analysis for multiple watermark placement that maximizes resilience to the above mentioned cropping attack. This involves the definition of various performance metrics and their use in evaluating and comparing various placement schemes.
A common technique for improving performance in a database is to decluster the database among multiple disks so that data retrieval can be parallelized. In this paper we focus on answering range queries in a multidimensional database (such as a GIS), where each of its dimensions is divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for this problem (a subset of which is [1,2,3,4,5,6,7,8,9,11,12,13,14,15]). A declustering scheme would be optimal if any range query could be answered by doing no more than # of tiles inside the range/# of disks retrievals from any one disk. However, it was shown in [1] that this is not achievable in many cases even for two dimensions, and therefore much of the research in this area has focused on developing schemes that performed close to optimal. Recently, the idea of using replication (i.e. placing records on more than one disk) to increase performance has been introduced [7,12,13,15]. If replication is used, a retrieval schedule (i.e. which disk to retrieve each tile from) must be computed whenever a query is being processed. In this paper we introduce a class of replicated schemes where the retrieval schedule can be computed in time O(# of tiles inside the query’s range), which is asymptotically equivalent to query retrieval for the non-replicated case. Furthermore, this class of schemes has a strong performance advantage over non-replicated schemes, and several schemes are introduced that are either optimal or are optimal plus a constant additive factor. Also presented in this paper is a strictly optimal scheme for any number of colors that requires the lowest known level of replication of any such scheme.
Most work on watermarking has resulted in techniques for different types of data: image, audio, video, text/language, software, etc. In this paper we discuss the watermarking of abstract structured aggregates of multiple types of content, such as multi-type/media documents. These semi-structures can be usually represented as graphs and are characterized by value lying both in the structure and in the individual nodes. Example instances include XML documents, complex web content, workflow and planning descriptions, etc. We propose a scheme for watermarking abstract semi-structures and discuss its resilience with respect to attacks. While content specific watermarking deals with the issue of protecting the value in the structure’s nodes, protecting the value pertaining to the structure itself is a new, distinct challenge. Nodes in semi-structures are value-carrying, thus a watermarking algorithm could make use of their encoding capacity by using traditional watermarking. For example if a node contains an image then image watermarking algorithms can be deployed for that node to encode parts of the global watermark. But, given the intrinsic value attached to it, the graph that “glues†these nodes together is in itself a central element of the watermarking process we propose. We show how our approach makes use of these two value facets, structural and node-content.