Protecting Biological and Health Data: Special Issues and Applications
This page talks
about protection methods for biological and health data, which often are
protected under the Health Insurance Portability and Accountability Act. These data typically contain demographic and other
potentially identifying information, and health variables that are sensitive.
Most of the typical alteration strategies can be applied on demographic/other
data; see the Methods tab at the top of this page for explanation of the methods. Below are links to
illustrative applications of confidentiality protections on biological and
health data. This list is by no means exhaustive, but it does illustrate the
techniques typically used to protect these data.
Aggregation and top-coding in the Health and Retirement Study (HRS)
The HRS uses aggregation of categories
(e.g., geographies, occupations), rounding and top-coding (monetary data), and
suppression of variables related to the survey design. These actions result in a
restricted access data file, which researchers can access after applying and
signing promises to maintain data confidentiality.
Noise addition and synthetic data in the National Health Interview Survey Linked Mortality Files
For each
person deemed at risk of identification, the Center for Disease Control staff
either add noise to the date of death or generate a synthetic value of the
underlying cause of death (after aggregated death codes). They also The results
from the perturbed and original data are compared in a 2008 paper in the American Journal
of Epidemiology (volume 168, pages 336-344).
Data swapping and microaggregation in the Substance Abuse and Mental Health Data Archive (SAMHDA)
The Inter-university
Consortium for Political and Social Research (ICPSR) archives and safeguards
many datasets, including the SAMHDA. The ICPSR uses data swapping and
microaggregation to protect records in these data.
The Personal Genome Project (PGP)
Genetic data are extremely difficult to
protect without substantial sacrifice in data usefulness. Researchers at the PGP
at Harvard University have taken a different approach: ask individuals to
consent to make their genetic data available to the public without modification.
Although the data are not protected, we include this link as an alternative
approach to data access