> I want to re-weight the data so the fitted model will be theoretically applicable to a population with a different proportion of females from the analytic sample. ... the resulting model should be unbiased, although the reported coefficient standard errors might be misleading. Will applying the WEIGHT option in SAS accomplish this? If so, how does it actually do the weighting within the procedure? (I mean mathematically, not how to program). If not, is there a better approach?
I hope I have not misunderstood, because this seems like it has such a simple, natural solution. Isn't this question asking about how to make a
prediction for a future population based on a model estimated from a sample? Why, then, should the procedure be any different from any other regression-based prediction? In particular, why should any special weighting be necessary? Just include gender as a variable in the model and use that for the prediction.
To understand what this does, consider the simplest case where the original logistic regression model (without gender) merely fits a constant
beta to a binary response. With
k 1's and
n-k 0's in the dataset, the likelihood is maximized for
beta = log(
k) - log(
n-k) corresponding to an estimated probability of
k/n. Applying this to a population of
N people, we would estimate
N(
k/n) 1's would occur in the population.
Including gender in this simple example is tantamount to dividing the data into
km 1's for
nm males and
kf = k-km 1's for
nf =
n-nm females. The likelihood is maximized when the male probability equals
km/nm and the female probability equals
kf/nf. To apply this to a population with
f females and
m males, we would estimate the number of 1's to be
f(
kf/nf) +
m(
km/nm). This is straightforward, easy to interpret, and flexible (because it can be applied to any future population without any refitting of the data). Standard errors of prediction are just as easily propagated, especially if
f and
m are known and not estimated with any error.
In the more complex case with additional explanatory variables, some assumptions must be made about their distributions within the future population. Nevertheless, the concept and method still work: apply the prediction from the fitted model to the future population. Including gender as one of the explanatory variables automatically performs the desired "weighting." There does not appear to be any need to weight the fitting procedure beforehand.
Best,
Bill Huber
Quantitative Decisions