Hi Isabella,
I think you are right concerning confidence intervals but wrong concerning prediction intervals. Or let's say it depends on the exact kind of confidence intervals.
If log(y) = a + b log(x) + error, then y = exp(a + b log(x)) * exp(error). If error has mean 0, then exp(error) has mean > 1 (Jensen's inequality). Therefore (assuming x and error to be independent), E(y|x) is larger than exp(a + b log(x)) (=exp(E(log y|x)).
If we have an unbiased estimate (e.g. the least squares estimate) for E(log y|x), the exponential of this estimate will be negatively biased for E(y|x). Therefore, a bias correction is called for (e.g. the one you suggested).
Let's look at prediction intervals for log y and y (conditional on x), respectively.
If [lo, up] is a 95% prediction interval for log y , this means that with 95% probability, the following inequality is true:
lo <= log y <= up (ineq 1)
Since exp(x) is a strinctly increasing function, (ineq 1) is equivalent to
exp(lo) <= exp( log y) <= exp(up), or
exp(lo) <= y <= exp(up) (ineq 2).
Since (ineq 1) and (ineq 2) are equivalent, they are either both true or both untrue. If (ineq 1) is true with probability 95%, so is (ineq 2). And this means that [exp(lo), exp(up)] is a 95% prediction interval for y. No bias correction needed.
Let's now look at confidence intervals for E(log y) and E(y) (conditional on x), respectively.
If [lo, up] is a 95% confidence interval for E(log y), this means that in 95% of all samples the following inequality is true:
lo <= E(log y) <= up (ineq 3).
Since exp(x) is a strinctly increasing function, (ineq 3) is equivalent to
exp(lo) <= exp( E(log y)) <= exp(up) (ineq 4).
Since (ineq 3) and (ineq 4) are equivalent, they are either both true or both untrue. If (ineq 3) is true 95% of the times, so is (ineq 4). And this means that [exp(lo), exp(up)] is a 95% confidence interval for exp( E(log y)). No bias correction needed.
BUT: we are (probably) not interested in a confidence interval for exp( E(log y)). We want a confidence interval for E(y). But this is not the same thing, we know that E(y) = exp( E(log y)) * E(exp(error)). So let's multiply (ineq 4) by E(exp(error)):
exp(lo)*E(exp(error)) <= exp( E(log y))*E(exp(error)) <= exp(up)*E(exp(error)) , or
exp(lo)*E(exp(error)) <= E(y) <= exp(up)*E(exp(error)) (ineq 5).
And this means that [exp(lo)*E(exp(error)), exp(up)*E(exp(error))] is a 95% confidence interval for E(y). Here, we need bias correction for the endpoints.
I hope that makes sense.
Concerning the source you cited: I think it's written a bit sloppy. With "quantiles", the author probably means quantiles of log y and y; if p is, say, the 5%-quantile of log(y), then exp(p) is the 5%-quantile of y. This is basically the same reasoning I used above for justifying that no bias correction is needed for prediction intervals. But I also don't see how this applies to confidence intervals in general.
-Hans-
------------------------------
Hans Kiesl
Regensburg University of Applied Sciences
Germany
------------------------------