Protecting rights over relational data is of ever increasing interest, especially considering areas where sensitive, valuable content is to be outsourced. A good example is a data mining application, where data is sold in pieces to parties specialized in mining it. Different avenues for rights protection are available, each with its own advantages and drawbacks. Enforcement by legal means is usually ineffective in preventing theft of copyrighted works, unless augmented by a digital counter-part, for example watermarking. Recent research of the authors introduces the issue of digital watermarking for generic number sets. In the present paper we expand on this foundation and introduce a solution for relational database content rights protection through watermarking. Our solution adresses important attacks, such as data re-sorting, subset selection, linear data changes (applying a linear transformation on arbitrary subsets of the data). Our watermark also survives up to 50% and above data losses. Finally we present wmbd.*, a proof-of-concept implementation of out algorithm and its application to real life data, namely in watermarking the outsourced Wal-Mart sales data that we have available at our institute.
In this paper we study the problem of declustering two-dimensional datasets with replication over parallel devices to improve range query performance. The related problem of declustering without replication has been well studied. It has been established that strictly optimal declustering schemes do no exist if data is not replicated. In addition to the usual problem of identifying a good allocation, the replicated version of the problem needs to address the issue of identifying a good retrieval schedule for a given query. We address both problems in this paper. An efficient algorithm for finding a lowest cost retrieval schedule is developed. This algorithm works for any query, not just range queries. Two replicated placement schemes are presented - one that results in a strictly optimal allocation, and another that guarantees a retrieval cost that is either optimal or 1 more than the optimal for any range query.
The growth of the Internet opens up tremendous opportunities for cooperative computation, where the answer depends on the private inputs of separate entities. Sometimes these computations may occur between mutually untrusted entities. The problem is trivial if the context allows the conduct of these computations by a trusted entity that would know the inputs from all the participants; however if the context disallows this then the techniques of secure multi-party computation become very relevant and can provide useful solutions. Statistic analysis is a widely used computation in real life, but the known methods usually require one to know the whole data set; little work has been conducted to investigate how statistical analysis could be performed in a cooperative environment, where the participants want to conduct statistical analysis on the joint data set, but each participant is concerned about the confidentiality of its own data. In this paper we have developed protocols for conducting the statistic analysis in such kind of cooperative environment based on a data perturbation technique and cryptography primitives.
Supply chain interactions have huge economic importance, yet theses interactions are managed inefficiently. One of the major sources of inefficiency in supply-chain management is information asymmetry; i.e., information that is available to one or more organizations in the chain (e.g., manufacturer retailer) is not available to others. There are several causes of information asymmetry, among them fear that a powerful buyer or supplier will take advantage of private information, that information will leak to a competitor, etc. We propose Secure Supply-Chain Collaboration (SSCC) protocols that enable supply-chain partners to cooperatively achieve desired system-wide goals without revealing the private information of any of the parties. Secure supply-chain collaboration has the potential to improve supply-chain management practice, and, by removing one major inefficiency therein, improve productivity. We present specific SSCC protocols for two types of supply-chain interactions: Capacity allocation, and e-auctions. In the course of doing so, we design techniques that are of independent interest, and are likely to be useful in the design of future SSCC protocols.
This paper discusses inherent vulnerabilities of digital watermarking that affect its mainstream purpose of rights protection. We ask: how resistant is watermarking to un-informed attacks ?
There are a multitude of application scenarios for watermarking and, with the advent of modern content distribution networks and the associated rights assessment issues, it has recently become a topic of increasing interest. But how well is watermarking suited for this main purpose of rights protection ? Existing watermarking techniques are vulnerable to attacks threatening their overall viability. Most of these attacks have the final goal of removing the watermarking information while preserving the actual value of the watermarked Work.
In this paper we identify an inherent trade-off between two important properties of watermarking algorithms: being “convincing enough” in court} while at the same time surviving a set of attacks, for a broad class of watermarking algorithms. We show that there exist inherent limitations in protecting rights over digital Works. In the attempt to become as convincing as possible (e.g. in a court of law, low rate of false positives), watermarking applications become more fragile to attacks aimed at removing the watermark while preserving the value of the Work. They are thus necessarily characterized by a significant (e.g. in some cases 35%+) non-zero probability of being successfully attacked without any knowledge about their algorithmic details. We quantify this vulnerability for a class of algorithms and show how a minimizing “sweet spot” can be found. We then derive a set of recommendations for watermarking algorithm design.
One common revenue model in Content Distribution Networks (CDN), requires the CDN operator to provide access statistics (e.g. hits, transferred bytes) to the content provider, who in turn is expected to deliver payment dependent on these reported values. An implicit assumption of self-regulated truthfulness of the CDN operator governs this process. The content provider has to trust that the content distributor provides accurate numbers and does not artificially “inflate” them. This type of one-sided accounting is not tolerated well in two-party business interactions. An independent accuracy proof is preferred.
Here we present a provable secure verification mechanism for access accounting in this framework. Our solution exploits one of the common enabling mechanisms of CDN location awareness, dynamic DNS. We discuss several variations and analyze associated attacks.
We describe novel methods of watermarking data using quadratic residues and random numbers. Our methods are fast generic and improve the security of the watermark in most known watermarking techniques.
We briefly describe the special number field sieve integer factoring algorithm, emphasizing the polynomial selection, and tell how we have used it to factor large integers on many workstations.
We report the factorization of a 135-digit integer by the three large prime variation of the multiple polynomial quadratic sieve, the largest factorization ever performed with MPQS. We show that it is worthwhile to use three large primes, contrary to previous work.
We have completely factored the numerators of the first 76 Bernoulli numbers and the first 44 Euler numbers. We studied the results seeking new theorems about the prime factors of these numbers and rediscovered two nearly-forgotten congruences for the Euler numbers.
The goal of the Clouds project at Georgia Tech is the implementation of a fault-tolerant distributed operating system based on the notions of objects and actions, which will provide an environment for the construction of reliable applications. As part of the Clouds project, the author designed and implemented a high-level language in which those levels of the Clouds system above the kernel level are being implemented. The Aeolus language provides access to the synchronization and recovery features of Clouds. It also provides a framework within which to study programming methodologies suitable for action-object systems such as Clouds. This dissertation describes programming methodologies appropriate to the design of fault-tolerant servers needed in the Clouds system. Among the properties needed by these objects are resilience and availability. As part of this research, several case studies - that will serve as designs for actual Cloud servers - have been developed in Aeolus. Among the issues examined using these case studies are: the use of knowledge about the semantics of an object, as opposed to automatic provisions, in designing for resilience and availability; the tradeoffs between consistency and availability for such objects; the support from the Aeolus runtime system and from the Clouds kernel needed for providing fault tolerance; and high-level language features for resilience and availability which may be derived from experience with programming in Aeolus.
One of the problems encountered in distributed systems is how to find the location of the resources needed by a computation. In many situations the location may have to be found at run time, when the resource is accessed, thus the efficiency of the location algorithm will affect the performance of the system. In general, the larger the distributed system, the more the number of processors at which a resource may reside at the time it is accessed. The general problem of resource location in distributed systems has not been addressed adequately, and most of the systems have adopted ad hoc solutions without a careful study of the performance of algorithms used. In this thesis it is studied the problem of finding the location of resources in order to get a better understanding of the factors affecting the cost of a location algorithm. This study will make it possible to judge proposed algorithms as well as to come up with new ones, optimized for particular systems.