ASA Connect

 View Only
  • 1.  Question on Bias Correction in Log-Log Modeling

    Posted 8 days ago
    Edited by Isabella Ghement 7 days ago

    Hi everyone, 

    If we start with a log-log linear model of the form log(Y) = alpha + beta*log(X) + error, we know we must apply some form of bias correction (BC) when we back-transform the fitted relationship a + b*log(X) so that we are estimating the conditional mean of Y (rather than the conditional median of Y). Here, log is the natural logarithm.

    The bias-corrected point estimate would look like this: BC*exp(a + b*log(X)).  For instance, if the error term is Normal with mean 0 and variance sigma^2, then BC = exp(0.5s^2), where s^2 is the sample variance of the errors.

    What intrigues me is why the same bias correction should NOT be applied to the endpoints of the confidence interval for the conditional mean of log(Y) or the prediction interval for a new value of log(Y) when log(X) is known?

    The only explicit reference I could find backing up this advice comes from here: 

    https://stat.ethz.ch/education/semesters/as2015/asr/Script_v151119.pdf

    I confess I do not understand the justification offered here for only applying the bias correction to the point estimate but not to confidence intervals or prediction intervals centered around that estimate on the log-log scale. What quantiles are we dealing with here? (The only quantiles I can think of are the critical values involved in the construction of these intervals.) 

    Maybe I am missing something obvious here but shouldn't we apply the bias correction to both the point estimate on the log-log scale AND the endpoints of the confidence/prediction intervals centered around this point estimate? 

    Thanks in advance for helping me make sense of this. 

    Isabella 

     

     



    ------------------------------
    [Isabella] [Ghement][Ghement Statistical Consulting Company Ltd.]
    ------------------------------



  • 2.  RE: Question on Bias Correction in Log-Log Modeling

    Posted 6 days ago

    Hi Isabella,

    I think you are right concerning confidence intervals but wrong concerning prediction intervals. Or let's say it depends on the exact kind of confidence intervals. 

    If  log(y) = a + b log(x) + error, then  y = exp(a + b log(x)) * exp(error). If error has mean 0, then exp(error) has mean > 1 (Jensen's inequality). Therefore (assuming x and error to be independent), E(y|x) is larger than exp(a + b log(x)) (=exp(E(log y|x)).

    If we have an unbiased estimate (e.g. the least squares estimate) for E(log y|x), the exponential of this estimate will be negatively biased for  E(y|x). Therefore, a bias correction is called for (e.g. the one you suggested).

    Let's look at prediction intervals for log y and y (conditional on x), respectively.

    If [lo, up] is a 95% prediction interval for log y , this means that with 95% probability, the following inequality is true:

    lo <= log y <= up  (ineq 1)

    Since exp(x) is a strinctly increasing function, (ineq 1) is equivalent to

    exp(lo) <= exp( log y) <= exp(up), or

    exp(lo) <= y <= exp(up)  (ineq 2).

    Since (ineq 1) and (ineq 2) are equivalent, they are either both true or both untrue. If (ineq 1) is true with probability 95%, so is (ineq 2). And this means that [exp(lo), exp(up)] is a 95% prediction interval for y. No bias correction needed.

    Let's now look at confidence intervals for E(log y) and E(y) (conditional on x), respectively.

    If [lo, up] is a 95% confidence interval for E(log y), this means that in 95% of all samples the following inequality is true:

    lo <= E(log y) <= up  (ineq 3).

    Since exp(x) is a strinctly increasing function, (ineq 3) is equivalent to

    exp(lo) <= exp( E(log y)) <= exp(up)  (ineq 4).

    Since (ineq 3) and (ineq 4) are equivalent, they are either both true or both untrue. If (ineq 3) is true 95% of the times, so is (ineq 4). And this means that [exp(lo), exp(up)] is a 95% confidence interval for exp( E(log y)).  No bias correction needed.

    BUT: we are (probably) not interested in a confidence interval for exp( E(log y)). We want a confidence interval for E(y). But this is not the same thing, we know that E(y) = exp( E(log y)) * E(exp(error)). So let's multiply (ineq 4) by E(exp(error)):

    exp(lo)*E(exp(error)) <= exp( E(log y))*E(exp(error)) <= exp(up)*E(exp(error)) , or

    exp(lo)*E(exp(error)) <= E(y) <= exp(up)*E(exp(error))  (ineq 5).

    And this means that [exp(lo)*E(exp(error)), exp(up)*E(exp(error))] is a 95% confidence interval for E(y). Here, we need bias correction for the endpoints. 

    I hope that makes sense.

    Concerning the source you cited: I think it's written a bit sloppy. With "quantiles", the author probably means quantiles of log y and y; if p is, say, the 5%-quantile of log(y), then exp(p) is the 5%-quantile of y. This is basically the same reasoning I used above for justifying that no bias correction is needed for prediction intervals. But I also don't see how this applies to confidence intervals in general.

    -Hans-



    ------------------------------
    Hans Kiesl
    Regensburg University of Applied Sciences
    Germany
    ------------------------------