The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

Rethinking Erasure Codes for Cloud Storage

Principal Investigator: Vaneet Aggarwal

Funded by: NSF:NeTS

Storage systems may have caches at the proxy or client ends in order to reduce the latency. However, caching for datacenters where the files are encoded with erasure codes gives rise to new challenges. The current results fall short of addressing the impact of erasure coding on latency and thus fail to providing insights on the optimal caching policy. First, using an (n, k) maximum-distance-separable (MDS) erasure code, a file is encoded into n chunks and can be recovered from any subset of k distinct chunks. Thus, file access latency in such a system is determined by the delay to access file chunks on hot storage nodes with the slowest performance. Significant latency reduction can be achieved by caching only a few hot chunks of each file (and therefore alleviating system performance bottlenecks), whereas caching additional chunks or even complete files only has diminishing benefits. It is an open problem to design a caching policy that optimally apportions limited cache capacity among all files in an erasure coded storage to minimize overall access latency. In this project, we propose a new functional caching approach called Sprout that can efficiently capitalize on existing file coding in erasure-coded storage systems. In contrast to exact caching that stores d chunks identical to original copies, our functional caching approach forms d new data chunks, which together with the existing n chunks satisfy the property of being an (n + d, k) MDS code. Thus, the file can now be recovered from any k out of n + d chunks (rather than k out of n under exact caching), effectively extending coding redundancy, as well as system diversity for scheduling file access requests. The proposed functional caching approach saves latency due to more flexibility to obtain k - d chunks from the storage system at a very minimal additional computational cost of creating the coded cached chunks. To the best of our knowledge, this is the first work studying functional caching for erasure-coded storage and proposing an analytical framework for cache optimization. Based on the arrival rates of different files, placement of file chunks on the servers, and service time distribution of storage servers, an optimal functional caching placement and the access probabilities of the file request from different disks are considered. The proposed algorithm gives significant latency improvement in both simulations and a prototyped solution in an open-source, cloud storage deployment.
 
 

Representative Publications