Protecting Demographic/Other Data: Special Issues and Applications
This page talks
about protection methods for demographic, educational, and other non-health and
non-tax data types. These data typically contain information that could be
available to the public; for example, geography, age, race, gender, marital
status, and property taxes. If these variables are released without alteration,
it may be possible for malicious data users to link names to these records by
matching to external data sources. For example, Sweeney (1997) showed that 97%
of the records in a medical database for Cambridge, MA, could be identified
using only birth date and 9-digit ZIP code by linking them to a publicly
available voter registration list. However, many times these data are samples
rather than censuses. Sampling protects individuals because it is not certain
whether a targeted individual was collected in the data.
Most of the
typical alteration strategies can be applied on demographic/other data; see the
web page on data protection methods for explanation of the methods. Often agencies apply
multiple methods on the same dataset. Below are links to illustrative
applications of confidentiality protections on demographic/other data. This list
is by no means exhaustive, but it does illustrate the techniques typically used
to protect these data.
Aggregation of geography in UK Census Data
Most statistical agencies and other data
disseminators aggregate geography before public release. One example is "output
areas" for the UK census, which is conducted by the UK Office for National
Statistics.
Top-coding in the American Communities Survey (ACS)
Top-coding is the most common approach to
protecting income and other monetary data. This link contains information on the
top-codes used for the public use microdata samples for the ACS, which is
collected by the Census Bureau.
Data swapping and the National Center for Education Statistics (NCES)
The NCES uses data swapping to create
restricted access files available to users via licensing. This JSM proceedings
paper describes some research by Westat into swapping procedures used by NCES.
Failure to protect confidentiality in educational data
This document describes several examples
in published educational data where data suppression does not protect
confidentiality.
Synthetic data in On The Map (LBD)
The U.S. Bureau of the
Census produces maps of where people live and work using partially synthetic
data with other techniques like suppression and adding noise.