Data Spillage in Hadoop Clusters
Project Members
Joe Beckman, Tosin Alabi, Dheeraj Gurugubelli
Joe Beckman, Tosin Alabi, Dheeraj Gurugubelli
Abstract
Data spillage is the undesired transfer
of classified information into an
unauthorized compute node or memory
media. The loss of control over sensitive and protected data can become a serious threat to business operations and national security (NSA Mitigation Group, 2012. We seek to understand if classified data leaked, by user error, into an unauthorized Hadoop Distributed File System (HDFS), be located, recovered, and removed completely from the server.