Preserving Genome Privacy in Research Studies

Document Type


Publication Date



One of the most dramatic measures to improve the quality of health care in America is the comprehensive adoption of electronic health records (EHRs) mandated by Health Information Technology for Economic and Clinical Health (HITECH) act. Once rare, more than 8 in 10 office-based physicians have adopted HER systems that allow healthcare providers and researchers to create and collect large-scale phenotypic data from patients with all manner of health conditions and treatments.

At the same time, dramatic cost reduction of whole genome sequencing has made genetic data increasingly available to biomedical research. Health “big data” provides enormous promise to advance public health and promote precision medicine. President Obama recently announced an initiative to build a national research cohort that integrates data from existing distributed networks and combines largescale genome and EHR data to discover genotype–phenotype associations that can improve health care for millions.

Yet deficient privacy protection can compromise these benefits by discouraging people from participating in this important research. The sharing of EHR and genome data may result in re-identification risk or disclose sensitive individual information like disease association or predisposition. That the genome data is shared with close relatives means any privacy loss implicates non-consented family members. This paper critically evaluates the distinctive re-identification risks in health big data sharing for EHR and genome data.

The federal Common Rule requires that informed consent be obtained to conduct research involving human subjects. The recent Notice of Proposed Rulemaking to revise that Rule calls for improvement of informed consent by increasing the transparency. The informed consent systems currently used in healthcare research, however, lack sufficient risk assessment under different use cases. For example, in blanket informed consent form, research subjects are often told that their data may be used for secondary research, without indicating specific risks of these unspecified studies.

Worse, genomic data are largely unprotected under the U.S. privacy law. This paper argues that a better understanding of these re-identification risks can significantly facilitate health big data sharing. The privacy challenges of large-scale data sharing for health research requires a comprehensive data-governance ecosystem that includes legal, ethical, and technical components. This paper advocates dynamic and concrete privacy protection in health research through the incorporation of informed consent measures, federal policy revisions, innovative data use agreements, and the adoption of emerging computer technologies capable of assessing privacy risks and protecting against them.

Publication Info

Medical Data Privacy Handbook, Springer, 2015