Discussion: View Thread

  • 1.  Bootstrap Prediction Intervals for Regression Models

    Posted 08-25-2014 14:59
    Hi everyone,

    How does one construct bootstrap prediction intervals for a linear regression model of the form Y = alpha + beta*X + error when "case resampling" is used (i.e., X is assumed random)?  "Case resampling" involves creating bootstrap samples by sampling the (Y,X) observations with replacement.  

    The reason I ask this question is because several of the literature sources I read argue that using "case resampling" should be preferred in the presence of heteroskedasticity and/or residual correlation (problems of concern in my specific situation).

    As I understand it, constructing a prediction interval for a new value of Y should use a double bootstrap loop.  

    The first loop would be straightforward:  generate B bootstrap samples using "case resampling" and use them to produce B point forecasts of the new value of Y. 

    The second part of the loop is what I am trying to figure out:  how do we factor in the additional variability associated with a single observation without "defeating" the "robustness" provided by "case resampling"?  (When we choose to re-sample residuals, which we would have to do for this second loop where a resampled residual would be added to the point forecast produced by the first loop, we implicitly assume that the fitted model is correct - something we didn't necessarily have to assume with "case resampling".)

    While "case resampling" in the first loop could ignore  heteroskedasticity and/or residual correlation, can "residual resampling" in the second loop really ignore these issues? 

    Any thoughts or references on how to best deal with this would be greatly appreciated. 

    Thanks,

    Isabella

    -------------------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.
    E-mail: Isabella@ghement.ca
    -------------------------------------------


  • 2.  RE: Bootstrap Prediction Intervals for Regression Models

    Posted 08-26-2014 10:24
     Isabella,

    A prediction interval is also known as a beta-expectation tolerance interval.  A great implementation of bootstrapping in tolerance interval construction has been provided by Hoffman, D. (2010) One-sided tolerance limits for balanced and unbalanced random effect models. Technometrics. 52:303-312.  There are other statistical methodologies for constructing predictions intervals.  For examples, please see Lin, T.Y. and Liao, C.T. (2006) A beta-expectation tolerance interval for general balanced mixed linear models. Comput. Stat. Data Anal. 50: 911-925; Lin, T.Y. and Liao, C.T. (2008) Prediction intervals for general balanced linear random models. J. Statist. Plann. Inference 138: 3164-3175.  An excellent review on computing prediction intervals is available from Section 12.1 and 12.2 of Krishnamoorthy, K. and Mathew, T. (2009) Statistical tolerance regions: Theory, applications, and computation.

    -------------------------------------------
    Qing Kang
    Chief Scientist
    Statistical Intelligence Group, LLC
    -------------------------------------------




  • 3.  RE: Bootstrap Prediction Intervals for Regression Models

    Posted 08-26-2014 12:29
    Hi,

    I'm not familiar with the literature Isabella mentions.... but I'm wondering if the idea for the second loop is to somehow resample residuals from cases that are 'near' the Y value of interest?  For an X, Y sample from continuous data, it would seem that all resamples of cases with a specific Y would tend to all be really the same X, Y case, with the same residual.  Isn't the idea to see how residuals tend to vary from one portion of the line to another portion (with, in practice, a bit of length to the portions)?

    Apologies if I'm missing something.

    -------------------------------------------
    William Goodman
    University of Ontario Institute of Technology
    -------------------------------------------




  • 4.  RE: Bootstrap Prediction Intervals for Regression Models

    Posted 08-26-2014 15:21
    Hi Isabella,

    As we know, the prediction interval should incorporate two sources of uncertainty: (i) uncertainty due to estimating the mean response of y_new, and (ii) that of arising from predictive probability density function (pdf) of y_new (or equivalently, predictive pdf of corresponding regression error term). The B point forecasts of the new value of Y that you obtain in the 'first loop' are already taking care of (i). Now, because of case resampling, you really don't have any assumption about the predictive regression error pdf other than that it has mean zero. A nonparametric estimate of its pdf is available from the regression residuals that you obtain in each bootstrap iteration in the first loop. Note that, these residuals are obtained over a range of X values and, therefore, incorporate any heteroskedasticity that may be present. Also, the resulting nonparametric pdf is not conditioned on any particular X value. Thus, in the second loop, if you add a random residual to y_new_hat in each bootstrap iteration, you are not assuming any particular model for the error term. Therefore, the approach you are following sounds okay to me.

    -------------------------------------------
    Khurram Nadeem
    Postdoctoral Fellow
    Acadia University
    -------------------------------------------