ASA Connect

View Only

Back to discussions

Expand all | Collapse all

Question on Bias Correction in Log-Log Modeling

Isabella Ghement03-10-2025 12:29

Hi everyone, If we start with a log-log linear model of the form log(Y) = alpha + beta*log(X) + error, ...

Hans Kiesl03-12-2025 10:25

Hi Isabella, I think you are right concerning confidence intervals but wrong concerning prediction ...

1. Question on Bias Correction in Log-Log Modeling

Recommend
Isabella Ghement
Posted 03-10-2025 12:29
Edited by Isabella Ghement 03-10-2025 17:48
Hi everyone,

If we start with a log-log linear model of the form log(Y) = alpha + beta*log(X) + error, we know we must apply some form of bias correction (BC) when we back-transform the fitted relationship a + b*log(X) so that we are estimating the conditional mean of Y (rather than the conditional median of Y). Here, log is the natural logarithm.

The bias-corrected point estimate would look like this: BC*exp(a + b*log(X)). For instance, if the error term is Normal with mean 0 and variance sigma^2, then BC = exp(0.5s^2), where s^2 is the sample variance of the errors.

What intrigues me is why the same bias correction should NOT be applied to the endpoints of the confidence interval for the conditional mean of log(Y) or the prediction interval for a new value of log(Y) when log(X) is known?

The only explicit reference I could find backing up this advice comes from here:

https://stat.ethz.ch/education/semesters/as2015/asr/Script_v151119.pdf

I confess I do not understand the justification offered here for only applying the bias correction to the point estimate but not to confidence intervals or prediction intervals centered around that estimate on the log-log scale. What quantiles are we dealing with here? (The only quantiles I can think of are the critical values involved in the construction of these intervals.)

Maybe I am missing something obvious here but shouldn't we apply the bias correction to both the point estimate on the log-log scale AND the endpoints of the confidence/prediction intervals centered around this point estimate?

Thanks in advance for helping me make sense of this.

Isabella

------------------------------
[Isabella] [Ghement][Ghement Statistical Consulting Company Ltd.]
------------------------------
2. RE: Question on Bias Correction in Log-Log Modeling

Recommend
Hans Kiesl
Posted 03-12-2025 10:25
Hi Isabella,

I think you are right concerning confidence intervals but wrong concerning prediction intervals. Or let's say it depends on the exact kind of confidence intervals.

If log(y) = a + b log(x) + error, then y = exp(a + b log(x)) * exp(error). If error has mean 0, then exp(error) has mean > 1 (Jensen's inequality). Therefore (assuming x and error to be independent), E(y|x) is larger than exp(a + b log(x)) (=exp(E(log y|x)).

If we have an unbiased estimate (e.g. the least squares estimate) for E(log y|x), the exponential of this estimate will be negatively biased for E(y|x). Therefore, a bias correction is called for (e.g. the one you suggested).

Let's look at prediction intervals for log y and y (conditional on x), respectively.

If [lo, up] is a 95% prediction interval for log y , this means that with 95% probability, the following inequality is true:

lo <= log y <= up (ineq 1)

Since exp(x) is a strinctly increasing function, (ineq 1) is equivalent to

exp(lo) <= exp( log y) <= exp(up), or

exp(lo) <= y <= exp(up) (ineq 2).

Since (ineq 1) and (ineq 2) are equivalent, they are either both true or both untrue. If (ineq 1) is true with probability 95%, so is (ineq 2). And this means that [exp(lo), exp(up)] is a 95% prediction interval for y. No bias correction needed.

Let's now look at confidence intervals for E(log y) and E(y) (conditional on x), respectively.

If [lo, up] is a 95% confidence interval for E(log y), this means that in 95% of all samples the following inequality is true:

lo <= E(log y) <= up (ineq 3).

Since exp(x) is a strinctly increasing function, (ineq 3) is equivalent to

exp(lo) <= exp( E(log y)) <= exp(up) (ineq 4).

Since (ineq 3) and (ineq 4) are equivalent, they are either both true or both untrue. If (ineq 3) is true 95% of the times, so is (ineq 4). And this means that [exp(lo), exp(up)] is a 95% confidence interval for exp( E(log y)). No bias correction needed.

BUT: we are (probably) not interested in a confidence interval for exp( E(log y)). We want a confidence interval for E(y). But this is not the same thing, we know that E(y) = exp( E(log y)) * E(exp(error)). So let's multiply (ineq 4) by E(exp(error)):

exp(lo)*E(exp(error)) <= exp( E(log y))*E(exp(error)) <= exp(up)*E(exp(error)) , or

exp(lo)*E(exp(error)) <= E(y) <= exp(up)*E(exp(error)) (ineq 5).

And this means that [exp(lo)*E(exp(error)), exp(up)*E(exp(error))] is a 95% confidence interval for E(y). Here, we need bias correction for the endpoints.

I hope that makes sense.

Concerning the source you cited: I think it's written a bit sloppy. With "quantiles", the author probably means quantiles of log y and y; if p is, say, the 5%-quantile of log(y), then exp(p) is the 5%-quantile of y. This is basically the same reasoning I used above for justifying that no bias correction is needed for prediction intervals. But I also don't see how this applies to confidence intervals in general.

-Hans-

------------------------------
Hans Kiesl
Regensburg University of Applied Sciences
Germany
------------------------------

Original Message

ASA Connect

Question on Bias Correction in Log-Log Modeling

Isabella Ghement03-10-2025 12:29

Hans Kiesl03-12-2025 10:25

1. Question on Bias Correction in Log-Log Modeling

2. RE: Question on Bias Correction in Log-Log Modeling

Contact Us

Membership

Privacy

Follow Us

ASA Connect

Question on Bias Correction in Log-Log Modeling

Isabella Ghement03-10-2025 12:29

Hans Kiesl03-12-2025 10:25

1. Question on Bias Correction in Log-Log Modeling

2. RE: Question on Bias Correction in Log-Log Modeling

Related Content

95% confidence interval for standardized beta coefficient in structural equation modeling

"What is your favorite confidence interval?"

Questions on interval censored data

Calculate Confidence Intervals for Vaccine Efficacy

Proportion Confidence Intervals for Stratified Data

Contact Us

Membership

Privacy

Follow Us