Demographic/Other Data

Protecting Demographic/Other Data: Special Issues and Applications

This page talks about protection methods for demographic, educational, and other non-health and non-tax data types. These data typically contain information that could be available to the public; for example, geography, age, race, gender, marital status, and property taxes. If these variables are released without alteration, it may be possible for malicious data users to link names to these records by matching to external data sources. For example, Sweeney (1997) showed that 97% of the records in a medical database for Cambridge, MA, could be identified using only birth date and 9-digit ZIP code by linking them to a publicly available voter registration list. However, many times these data are samples rather than censuses. Sampling protects individuals because it is not certain whether a targeted individual was collected in the data.

Most of the typical alteration strategies can be applied on demographic/other data; see the web page on data protection methods for explanation of the methods. Often agencies apply multiple methods on the same dataset. Below are links to illustrative applications of confidentiality protections on demographic/other data. This list is by no means exhaustive, but it does illustrate the techniques typically used to protect these data.

Aggregation of geography in UK Census Data
Most statistical agencies and other data disseminators aggregate geography before public release. One example is "output areas" for the UK census, which is conducted by the UK Office for National Statistics.

Top-coding in the American Communities Survey (ACS)
Top-coding is the most common approach to protecting income and other monetary data. This link contains information on the top-codes used for the public use microdata samples for the ACS, which is collected by the Census Bureau.

Data swapping and the National Center for Education Statistics (NCES)
The NCES uses data swapping to create restricted access files available to users via licensing. This JSM proceedings paper describes some research by Westat into swapping procedures used by NCES.

Failure to protect confidentiality in educational data
This document describes several examples in published educational data where data suppression does not protect confidentiality.

Synthetic data in On The Map (LBD)
The U.S. Bureau of the Census produces maps of where people live and work using partially synthetic data with other techniques like suppression and adding noise.