In this post, I am going to discuss the importance of performing sensitivity analysis with respect to important assumptions made in statistical analyses. This is particularly important in the analysis of crime data that may have inherent limitations that preclude us from studying certain questions without making strong assumptions. In this post I will be replicating, and expanding upon, an analysis done in a very interesting paper found in Zhao et al. (2022). I will provide some basic details about the analysis here, but I certainly encourage readers to take a look at the full manuscript, which covers a number of different issues that may be of interest as well. The paper examines the difficult issue of racial bias in policing, and proposes a particular estimand that is policy-relevant and easier to estimate given the underlying limitations of the observed data. The goal of this post is not to add to the discussion of this difficult issue, but rather to show, using publicly available data, how sensitivity analysis can be useful within analyses that require strong, unverifiable assumptions.
Before going into any details, let us first define some notation and estimands. I denote the data as \((Y_i, M_i, D_i, X_i)\) for \(i=1, \dots, n\) where the unit \(i\) corresponds to any encounter between the police and a civilian, whether it leads to the police stopping said civilian or not. The outcome \(Y_i\) is whether force is used on the civilian, \(M_i\) corresponds to whether the civilian was stopped, \(D_i\) is a binary indicator of race where \(D=1\) corresponds to Black and \(D=0\) is all other races, and \(X_i\) are characteristics of the individual. I also need to define potential outcomes as this blog addresses an inherently causal quantity. Let \(M(d)\) represent whether someone would be stopped if their race were set to \(d\). Similarly, let \(Y(d,m)\) represent whether someone would have had forced use against them if their race was \(d\) and had stop status \(m\). Note that by definition \(Y(d,0) = 0\) since someone cannot have force used against them if they are not stopped. Also note that \(Y(d) = Y(d, M(d))\), which is the outcome that would have been observed if race were set to \(d\) and \(M\) is allowed to take whatever value it would have taken if race were set to \(d\).
A major difficulty with this data is that it only contains interactions that lead to a stop, which means that the observed data only consists of observations for which \(M_i = 1\). Because of this, studying any quantity in the general population of police-civilian encounters (that does not condition on \(M=1\)) is inherently difficult. Due to this limitation, the authors propose to examine the causal risk ratio, which is defined as \[CRR(x) = \frac{P(Y(1) = 1 \mid X = x)}{P(Y(0) = 1 \mid X = x)}.\] This is the ratio of the the probabilities of force for a Black individual compared with a non-Black individual with characteristics \(X_i\). As discussed in the manuscript, this estimand is useful because it avoids having to estimate \(P(M = 1)\), which is generally not feasible. Under certain assumptions, this quantity can be written as \[\begin{align} CRR(x) = \frac{P(Y = 1 \mid D=1, M=1, X=x)}{P(Y = 0 \mid D=1, M=1, X=x)} \frac{P(D = 1 \mid M=1, X=x)}{P(D = 0 \mid M=1, X=x)} \frac{P(D = 0 \mid X=x)}{P(D = 1 \mid X=x)}. \end{align}\] This corresponds to the value of the causal risk ratio for a particular covariate value \(x\), but in order to more easily summarize the results, I will examine an average causal risk ratio that is averaged over the distribution of covariates, which I define as \[\overline{CRR} = E(CRR(X_i)).\] The results from doing this can be found in Figure 1, where one can see that the estimate of \(\overline{CRR}\) is well above 1 for nearly every precinct in the city. This mirrors the results found in Zhao et al. (2022) and indicates that Black individuals are more likely to have force used against them. Note that I have attached code to replicate this analysis along with this blog post if you’re interested in exploring it.
This analysis rests on a number of untestable assumptions, however, and violations of these assumptions could be biasing the results. For today’s discussion, I will focus on the assumption that there are no unmeasured confounders of the \(D - Y\) relationship. If this assumption is violated, then the expression given above in equation (1) does not correspond to \(CRR(x)\) and there will likely be bias in the estimates. The goal is to assess how robust the findings are to violations of that assumption by seeing how big the violations of that assumption must be in order to remove the significant effects seen in Figure 1. Suppose that there is some unmeasured variable \(U\) which affects both \(D\) and \(Y\). Let us define the true values of \(CRR(x)\) and \(\overline{CRR}\) as \(CRR^*(x)\) and \(\overline{CRR}^*\), respectively. These are the values that one would obtain by correctly adjusting for both \(X\) and \(U\) throughout, whereas the values \(CRR(x)\) and \(\overline{CRR}\) are the values obtained when only adjusting for \(X\) and ignoring the presence of \(U\). First, one can write \[CRR^*(x) = CRR(x) \times \xi(x),\] where if \(\xi(x) = 1\) then there is no bias, but if there is unmeasured confounding, it may be that \(\xi(x) \neq 1\) and there is a difference between the true value and the one obtained. Interestingly, one can write down exactly what \(\xi(x)\) is in this case: \[\xi(x) = \frac{E_{U \mid M(1)=1, X=x} \bigg[ \frac{f(U \mid M=1, D=1, Y=1, X=x)}{f(U \mid M=1, D=1, X=x)} \bigg]}{E_{U \mid M(0)=1, X=x} \bigg[ \frac{f(U \mid M=1, D=0, Y=1, X=x)}{f(U \mid M=1, D=0, X=x)} \bigg]}.\] Here I am using the notation that \(f()\) corresponds to a density, and \(E_{U \mid M(1)=1, X=x}\) is the expectation over the distribution of \(U\) given \(X=x\) and \(M(1)=1\).
At first, this quantity may not appear intuitive, but it has some features that are useful to note. First, if
\(U\) and
\(Y\) are independent given
\((M, D, X)\), then this quantity immediately becomes 1 because
\(Y=1\) drops out of all the conditioning statements and both the numerator and denominator of
\(\xi(x)\) are 1. While less obvious to see, one can show that if
\(U\) and
\(D\) are independent given
\((M(d), X)\) then both the numerator and denominator are 1 and
\(\xi(x) = 1\). This goes along with the usual notion of confounding where there is no bias unless the unmeasured variable causes both
\(D\) and
\(Y\). Now that I have introduced
\(\xi(x)\) one can bound the bias due to unmeasured confounding. In what remains, for simplicity I will refer to
\(CRR(X)\) as
\(CRR\) and I will refer to
\(\xi(X)\) as
\(\xi\), but it is implied they are a function of
\(X\) throughout. Specifically, one can see that
\[\begin{align*} \overline{CRR} - \overline{CRR}^* &= E \left( CRR - CRR^* \right) \\ &= E \left( CRR (1 - \xi) \right) \\ &= -\text{Corr}(CRR, \xi) \sqrt{\text{Var}(CRR) \text{Var}(\xi)} + E(CRR) (1 - E(\xi)). \end{align*}\] This bias of the causal effect depends on three quantities that one cannot estimate from the observed data because they involve
\(\xi\):
Correlations are necessarily bounded between -1 and 1, however, so one can place an upper bound on the bias as a function of the remaining two sensitivity parameters as \[| \overline{CRR} - \overline{CRR}^*| \leq \sqrt{\text{Var}(CRR) \text{Var}(\xi)} + E(CRR) | 1 - E(\xi)|.\] For more details on these calculations, see Huang et al. (2026) or related ideas from sensitivity analyses for IPW estimators (Hong et al. (2021), Huang (2024)). This is great, because one can now see how large the bias can be as a function of two sensitivity parameters, and can explore how large these two parameters must become to make any effects disappear. This is typically what one is aiming for in sensitivity analyses: write down or bound the bias in terms of sensitivity parameters, and then reason about plausible values of those sensitivity parameters. The one issue in this setting is that these sensitivity parameters are not particularly interpretable and are difficult to reason about. Many times sensitivity parameters are in terms of partial \(R^2\) values or regression coefficients that are easy to reason about, but these parameters are moments of random variables that are themselves ratios of densities. If I were to ask a criminologist what a reasonable value of these would be if I was missing a key covariate, they certainly would not know. There are a couple of paths forward though that I won’t discuss in detail here, but will briefly mention, since interpretability and being able to think about the values of sensitivity parameters is a key part of sensitivity analysis. For one, you can do some simple simulation studies where you generate data with unmeasured confounding of varying degrees of strength and see what happens to \(E(\xi)\) and \(\text{Var}(\xi)\). This will give you a better understanding of what values represent a high degree of confounding. Possibly even better, one can use the observed covariates to ``benchmark” what reasonable values of these sensitivity parameters are using observed covariates. For a full description of how to do this formally, see Cinelli and Hazlett (2020) or Huang et al. (2026) who implemented this in a related context. The code provided with this blog post does a simple form of benchmarking that is slightly easier to implement, but still provides useful results. These can be found in Figure 2, where it is assumed that the sensitivity parameters from an unmeasured confounder are 3 times as large as the sensitivity parameters obtained from excluding the most important observed covariate in the data. This is quite conservative because it is assuming that the unmeasured confounder is far more problematic than any of the observed covariates. Despite this, nearly all of the intervals are still well above the null value of 1 indicating that Black individuals are more likely to have forced used against them. Importantly, this sensitivity analysis shows that the estimated effects are robust to the potential presence of unmeasured confounding as it would take an exceedingly strong unmeasured confounder to make the estimates of the CRR statistically insignificant. Note though that there are other assumptions made in this analysis that I did not detail in the blog, which could also bias the estimates, and sensitivity analyses on those assumptions could be done in a similar manner as well to further assess the robustness of the overall findings.
Replication code files: BlogAnalysis.r Benchmarking.r
References
Zhao, Q., Keele, L. J., Small, D. S., & Joffe, M. M. (2022). A note on posttreatment selection in studying racial discrimination in policing. American Political Science Review, 116(1), 337-350.
Huang, Z., Beck, B., & Antonelli, J. (2026). Causal inference and racial bias in policing: New estimands and the importance of mobility data. Journal of the Royal Statistical Society Series A: Statistics in Society. To appear.
Cinelli, C., & Hazlett, C. (2020). Making sense of sensitivity: Extending omitted variable bias. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(1), 39-67.
Hong, G., Yang, F., & Qin, X. (2021). Did you conduct a sensitivity analysis? A new weighting-based approach for evaluations of the average treatment effect for the treated. Journal of the Royal Statistical Society Series A: Statistics in Society, 184(1), 227-254.
Huang, M. Y. (2024). Sensitivity analysis for the generalization of experimental results. Journal of the Royal Statistical Society Series A: Statistics in Society, 187(4), 900-918.