When conducting research, life scientists rely heavily on clinically annotated specimens, and the most thorough and effective clinical annotations contain information that is found in the electronic health records (EHRs) for the human subjects that are participating in the scientists’ studies. One primary piece of legislation pertinent to electronic health records is the Health Information Portability and Accountability Act (HIPAA, 1996). To protect the privacy of the human subjects, HIPAA dictates differing levels of access to the information found in the EHRs based on the roles that researchers play in a particular study; these levels vary from full access (including protected health information) to very limited (i.e., public) access. In the case of public access, the data must be de-identified based on criteria elucidated in the HIPAA legislation, and some of these criteria are stated in a general fashion to reflect the fluid nature of modern science. Due to these ambiguities, the complex measures that are often necessary to de-identify protected health information, and the risk of litigation and lost reputation, scientists rarely share their de-identified annotated data beyond their current study.
Unfortunately, this lack of sharing negatively impacts the reuse of experimental data beyond its current context, and in turn, this lack of reuse can adversely affect the translational impact of basic life sciences. In contrast to this constricting approach to the management of clinical annotations is the move in computing toward the “Cloud” wherein data are stored for easy retrieval and sharing. In our current study, we are surveying life scientists to ascertain their perceptions of a cloud-based approach to the management of their annotated data.
Health Insurance Portability and Accountability Act of 1996 (HIPAA). (1996). Retrieved July 10, 2009 from http://www.cms.hhs.gov/HIPAAGenInfo/Downloads/HIPAALaw.pdf.
The undergraduate student will conduct a comprehensive literature review and perform an analysis of the large data repositories frequently used in the life sciences. There are several large repositories. The Susan B. Komen Virtual Tissue Bank is one example. The Komen Virutal Tissue Bank is the only repository in the world for normal breast tissue and matched serum, plasma and DNA. By studying normal tissue, we accelerate research for the causes and prevention of breast cancer. To more deeply understand the evolution of the disease, it is necessary to compare abnormal, cancerous tissue against normal, healthy tissue. Student research projects include: - Characterization of how these large data repositories handle the sensitivity and privacy of the information they store. - Best practices for designing proteomic, genomic and metabalomic databases to enable data sharing and reuse while managing privacy and security requirements.
Keywords: cloud, health care, big data, privacy, HIPPA