Rethinking Erasure Codes for Cloud Storage
Research Areas: Network Security
Principal Investigator: Vaneet Aggarwal
Funded by: NSF:NeTS
Storage systems may have caches at the proxy or client ends in order to reduce the latency. However, caching for datacenters where the files are encoded with erasure codes gives rise to new challenges. The current results fall short of addressing the impact of erasure coding on latency and thus fail to providing insights on the optimal caching policy. First, using an (n, k) maximum-distance-separable (MDS) erasure code, a file is encoded into n chunks and can be recovered from any subset of k distinct chunks. Thus, file access latency in such a system is determined by the delay to access file chunks on hot storage nodes with the slowest performance. Significant latency reduction can be achieved by caching only a few hot chunks of each file (and therefore alleviating system performance bottlenecks), whereas caching additional chunks or even complete files only has diminishing benefits. It is an open problem to design a caching policy that optimally apportions limited cache capacity among all files in an erasure coded storage to minimize overall access latency. In this project, we propose a new functional caching approach called Sprout that can efficiently capitalize on existing file coding in erasure-coded storage systems. In contrast to exact caching that stores d chunks identical to original copies, our functional caching approach forms d new data chunks, which together with the existing n chunks satisfy the property of being an (n + d, k) MDS code. Thus, the file can now be recovered from any k out of n + d chunks (rather than k out of n under exact caching), effectively extending coding redundancy, as well as system diversity for scheduling file access requests. The proposed functional caching approach saves latency due to more flexibility to obtain k - d chunks from the storage system at a very minimal additional computational cost of creating the coded cached chunks. To the best of our knowledge, this is the first work studying functional caching for erasure-coded storage and proposing an analytical framework for cache optimization. Based on the arrival rates of different files, placement of file chunks on the servers, and service time distribution of storage servers, an optimal functional caching placement and the access probabilities of the file request from different disks are considered. The proposed algorithm gives significant latency improvement in both simulations and a prototyped solution in an open-source, cloud storage deployment.
- Abubakr Alabassi and Vaneet Aggarwal, "Video Streaming in Distributed Erasure-coded Storage Systems: Stall Duration Analysis," IEEE/ACM Transactions on Networking, vol. 26, no. 4, pp. 1921-1932, Aug. 2018.Abubakr Alabassi and Vaneet Aggarwal, “Stall-Quality Tradeoff for Cloud-based Video Streaming,” in Proc. IEEE SPCOM, Jul 2018Abubakr O. Al-Abbasi and Vaneet Aggarwal, "EdgeCache: An Optimized Algorithm for CDN-based Over-the-top Video Streaming Services," in Proc. Infocom Workshop (International Workshop on Integrating Edge Computing, Caching, and Offloading in Next Generation Networks (IECCO)), Apr 2018.Abubakr O. Al-Abbasi and Vaneet Aggarwal, "Mean Latency Optimization in Erasure-coded Distributed Storage Systems," in Proc. Infocom Workshop (International Workshop on Cloud Computing Systems, Networks, and Applications (CCSNA)), Apr 2018.Vaneet Aggarwal, Yih-Farn Robin Chen, Tian Lan, and Yu Xiang, "Sprout: A functional caching approach to minimize service latency in erasure-coded storage," IEEE/ACM Transactions on Networking, vol. 25, no. 6, pp. 3683-3694, Dec. 2017.Vaneet Aggarwal, Jingxian Fan, and Tian Lan, "Taming Tail Latency for Erasure-coded, Distributed Storage Systems," in Proc. IEEE Infocom, May 2017Vaneet Aggarwal, Yih-Farn Robin Chen, Tian Lan, and Yu Xiang, "Sprout: A functional caching approach to minimize service latency in erasure-coded storage," in Proc. IEEE ICDCS, Jun 2016.Yu Xiang, Vaneet Aggarwal, Tian Lan, and Yih-Farn Robin Chen, "Differentiated latency in data center networks with erasure coded files through traffic engineering," Accepted to IEEE Transactions on Cloud Computing, Dec 2016.Yu Xiang, Tian Lan, Vaneet Aggarwal, and Yih-Farn Robin Chen, "Optimizing Differentiated Latency in Multi-Tenant, Erasure-Coded Storage," IEEE Transactions on Network and Service Management, vol. 14, no. 1, pp. 204-216, March 2017.Chao Tian, Birenjith Sasidharan, Vaneet Aggarwal, Vinay Vaishampayan, and P. Vijay Kumar, "Layered, Exact-Repair Regenerating Codes Via Embedded Error Correction and Block Designs," IEEE Trans. Inf. Th., vol.61, no.4, pp.1933-1947, April 2015.Yu Xiang, Tian Lan, Vaneet Aggarwal, and Yih-Farn Robin Chen, "Joint Latency and Cost Optimization for Erasure-coded Data Center Storage," IEEE/ACM Transactions on Networking, vol. 24, no. 4, pp. 2443-2457, Aug. 2016.Yu Xiang, Tian Lan, Vaneet Aggarwal, and Yih-Farn Robin Chen, "Joint Latency and Cost Optimization for Erasure-coded Data Center Storage," ACM SIGMETRICS Performance Evaluation Review, vol, 42, no. 2, pp.3-14, Sept. 2014.Yu Xiang, Tian Lan, Vaneet Aggarwal, and Yih-Farn Robin Chen, "Multi-Tenant Latency Optimization in Erasure-Coded Storage with Differentiated Services," in Proc. ICDCS, Jun-Jul. 2015Yu Xiang, Vaneet Aggarwal, Yih-Farn Robin Chen, and Tian Lan, "Taming latency in data center networking with erasure coded files," in Proc. IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, May 2015Vaneet Aggarwal, Chao Tian, Vinay Vaishampayan, and Yih-Farn Robin Chen "Distributed Data Storage Systems with Opportunistic Repair," in Proc. IEEE Infocomm, Apr. 2014