Discussion: View Thread

  • 1.  confidence interval for GLM predicted value in R

    Posted 07-15-2011 15:29

    Hi all,

    I was trying to calculate confidience intervals for predicted values in GLM with log as the link function. The model in a simple form is built as below (using the gam library for na.gam.replace

    mdl <- glm(FEES ~ VAR1
    CAT2 + IND3,

             family = quasi(link = "log", var = "mu"), data = test, na.action = na.gam.replace)

    I can think of two ways to calculate the CI for the fitted values.

    1) to get SE on response level

    yyy <- predict(mdl, type="response", se.fit=TRUE)

    lower <- yyy$fit-1.645*yyy$se.fit

    upper <- yyy$fit+1.645*yyy$se.fit

    2) to get SE on linear level and transform (exp) back. Of course, this won't produce symmetric CIs.

    yy <- predict(mdl, type="link", se.fit=TRUE)
    ll <- exp(yy$fit-1.645*yy$se.fit)
    lu <- exp(yy$fit+1.645*yy$se.fit)

    The results differ but not by much. In the particular example I created, CI bounds from 2) shift towards larger values from 1), i.e. both lower bounds and upper bounds are larger.

    So which one is the more appropriate way to do it?

    Also this is the CI's for the mean predicted value (I think), how do I construct a CI for a single predicted value?

    Any comments and suggestions are highly appreciated!

    Have a nice weekend,

    Ru


    -------------------------------------------
    Ru Sun
    Ernst & Young LLP
    -------------------------------------------


  • 2.  RE:confidence interval for GLM predicted value in R

    Posted 07-15-2011 16:39

    A symmetric confidence interval ought to be best when the distribution is in fact symmetric, which is probably more close to true on the linear level than on the transfored scale.  So my intuition is that (2) below will work better.

    You could do a small simulation study to find out which approach gives you appropriate coverage rates.  Or if they're both the same, which gives smaller average confidence interval widths on the response scale?

    You're right, this is a confidence interval for the predicted value for the model (not just the average, but the model prediction given all the data).  The distribution of the response would not be symmetric due to the log link.  On the linear scale (which it appears depends on the mean), you need to have a variance estimate and then a symmetric prediction interval would be prediction +/- q * sqrt(replicate variance + se^2).  Again, a small simulation may add confidence that you have everything right.

    -Jim

    -------------------------------------------
    James Garrett
    Manager, R&D Statistics
    Becton Dickinson
    -------------------------------------------