Biological and Health Data

Protecting Biological and Health Data: Special Issues and Applications

This page talks about protection methods for biological and health data, which often are protected under the Health Insurance Portability and Accountability Act. These data typically contain demographic and other potentially identifying information, and health variables that are sensitive. Most of the typical alteration strategies can be applied on demographic/other data; see the Methods tab at the top of this page for explanation of the methods. Below are links to illustrative applications of confidentiality protections on biological and health data. This list is by no means exhaustive, but it does illustrate the techniques typically used to protect these data.

Aggregation and top-coding in the Health and Retirement Study (HRS)
The HRS uses aggregation of categories (e.g., geographies, occupations), rounding and top-coding (monetary data), and suppression of variables related to the survey design. These actions result in a restricted access data file, which researchers can access after applying and signing promises to maintain data confidentiality.

Noise addition and synthetic data in the National Health Interview Survey Linked Mortality Files
For each person deemed at risk of identification, the Center for Disease Control staff either add noise to the date of death or generate a synthetic value of the underlying cause of death (after aggregated death codes). They also The results from the perturbed and original data are compared in a 2008 paper in the American Journal of Epidemiology (volume 168, pages 336-344).

Data swapping and microaggregation in the Substance Abuse and Mental Health Data Archive (SAMHDA)
The Inter-university Consortium for Political and Social Research (ICPSR) archives and safeguards many datasets, including the SAMHDA. The ICPSR uses data swapping and microaggregation to protect records in these data.

The Personal Genome Project (PGP)
Genetic data are extremely difficult to protect without substantial sacrifice in data usefulness. Researchers at the PGP at Harvard University have taken a different approach: ask individuals to consent to make their genetic data available to the public without modification. Although the data are not protected, we include this link as an alternative approach to data access