B: Overviews

Methods for Reducing Disclosure Risks When Sharing Data:
B.  Overviews of Statistical Disclosure Protection Methods


Most data disseminators alter data before sharing them with others.  Some of the more common approaches include

  • aggregation:  coarsen categorical data, e.g., do not release geographic units under 100,000 people
  • top-coding: report values (e.g., incomes, ages) above thresholds only as "above the threshold"
  • suppression:  make sensitive values missing in the released file
  • data swapping:  switch one record's values on key variables with another record's values
  • noise addition:  fuzz data values by adding randomly generated values to sensitive real data values
  • microaggregation: cluster numerical data (e.g., incomes) in groups of at least three records, and replace each cluster member's value with the average value in its cluster.
  • multiple imputation for disclosure limitation (also called synthetic data):  replace sensitive values with simulated values drawn from statistical models.

The general references below describe some of the benefits and limitations of these approaches.  See the page on Further References for details on specific topics.  Please feel free to contact the chair of the ASA Privacy and Confidentiality Committee for additional guidance.

1.  Duncan, G. T., M. Elliott, and J.J. Salazar-Gonzalez (2011).  Statistical Confidentiality:  Principles and Practice, New York:  Springer.

2.   Hundepool, A., J. Domingo-Ferrer, L. Franconi, S. Geissing, E.S. Nordholdt, K. Spicer, and P-P de Wolf (2012).
  Statistical Disclosure Control, West Sussex:  Wiley.

3.  Statistical Policy Working Paper 22 - Report on Statistical Disclosure Limitation Methodology (2005)

Extremely useful for newcomers to this field. Opening chapters are especially valuable for those interested in a non-technical treatment of essential concepts and techniques. Following a description of federal agency practices as of the early 1990's, more technical discussions of disclosure limitation methodology for both tabular data and microdata files are provided. The report concludes with a list of recommendations and a research "agenda." Contains an extensive annotated bibliography.

4.  Willenborg, L., and De Waal, T. (2000), Elements of Statistical Disclosure Control, Vol. 155, Lecture Notes in Statistics, 
New York: Springer

This book describes common statistical methods of disclosure control, including those listed above.  The authors also wrote a book in 1996, Statistical Disclosure Control in Practice, that is less mathematically technical than the 2000 book.