Business and Tax Data

Protecting Business and Tax Data: Special Issues and Applications

Business and tax data are among the most sensitive data collected by government agencies and researchers. These data often contain highly skewed variables that can be at risk for disclosures. For example, if given actual total payroll on manufacturers in the airline industry, it may be relatively easy to identify Boeing; it is the record with the largest payroll. Furthermore, businesses and individuals understandably want to guard the privacy of this information. For example, private companies do not want their competition to know the amounts they spend on marketing, research and development, payroll, etc., as this might compromise their business practice. And, individuals may be reluctant for others to learn their salaries or total incomes.

If data collectors disseminated business and tax data in ways that resulted in harm to businesses and individuals, data subjects might not be willing to provide their data. This would damage government's ability to make economic policy and reduce researchers' opportunities to analyze economic data. Thus, most business and tax data, if released at all (in fact, there are no public use business micrdata available in the U.S.), are altered before release.

Nearly all the typical alteration strategies are applied on business and tax data; see the Methods tab at the top of this page for explanation of the methods. Below are links to illustrative applications of confidentiality protections on business and tax data. This list is by no means exhaustive, but it does illustrate the techniques typically used to protect these data.

Aggregation in the County Business Patterns (CBP)
Business and tax microdata are frequently aggregated for public use. This link to the CBP, released by the Census Bureau, illustrates how establishments' payroll and employee size are aggregated to create public use tables.

Noise addition in the Commodity Flow Survey (CFS)
This paper illustrates how noise can be added to underlying economic microdata when the released data are tabular. The CFS is released by the Census Bureau.

Noise addition and Synthetic Data in the Longitudinal Employer-Household Dynamics (LEHD) Program
This presentation provides an example of adding noise and using synthetic data in establishment-level data. The LEHD program is run by the Census Bureau.

Microaggregation in the Individual Tax Model Public Use File (ITMPUF)
This link is to a paper in the 2002 proceedings of the Joint Statistical Meetings that describes the microaggregation strategy used for the ITMPUF, which is released by the Statistics of Income division of the Internal Revenue Service.

Synthetic data in the Survey of Consumer Finances
The Federal Reserve Board protects sensitive monetary values by replacing them with multiple imputations. This is the first published instance of what is now known as partially synthetic data.

Synthetic data in the Longitudinal Business Database (LBD)
The U.S. Bureau of the Census is developing a partially synthetic public use data set for the LBD. This working paper summarizes some of the initial development.