Discussion: View Thread

  • 1.  Multiple Interaction Terms in a Model

    Posted 07-05-2012 17:06
    This message has been cross posted to the following eGroups: Statistics in Epidemiology Section and Statistical Consulting Section .
    -------------------------------------------

    Dear All,
      I have tried to fit a regression model in which there are multiple interaction terms as shown below:

    logit(y)=b0+b1*x1+b2*x2+b3*x3+b4*x4+ b5*x3*x4 +  b6*x2*x4 +  b7*x1*x4

    As you see, there is repeated interaction with the variable X4.  Is this a useful model?  When I get the coefficients I recognise that I would  get a single etimate for the effect of X4 at the reference value levels for X1, X2 and X3.  I cannot assume that the effect of X4 is the same at all the level 1 (reference)  categories. 

    What would  you suggest?  I am more comfortable doing a separate model for each interaction term as follows:

    logit(y)=b0+b1*x1+b2*x2+b3*x3+b4*x4+ b5*x3*x4 

    logit(y)=b0+b1*x1+b2*x2+b3*x3+b4*x4+    b5*x2*x4 

    logit(y)=b0+b1*x1+b2*x2+b3*x3+b4*x4+     b5*x1*x4

    I have used Stata to carry out the analysis I am using AIC and Hosmer-Lemeshow staistcs to assess goodness-of-fit and select the model that best describes the relationship between outcome and explanatory variables. . 

    Looking forward to hearing from you.

    Regards.

    Novie





    -------------------------------------------
    Novie Younger-Coleman
    Statistician
    Tropical Medicine Research Institute, Jamaica
    -------------------------------------------


  • 2.  RE:Multiple Interaction Terms in a Model

    Posted 07-05-2012 17:56
    Dear Novie,

    It is always (in my opinion) best to have only one model. Whenever we have more than one model for the same DV, unless they constitute a well-designed system of equations, we are bound to run into problems.

    In your example, we can be just about certain that the three estimates you will produce of each of the coefficients b0, b1, b2, b3, and b4 will be different, and possibly even contradictory (of differing signs, for example). Then what?

    You can reduce the effects of colinearity, as suggested by another respondent in a different thread, by centering your independent variables. I recommend that you do this and stick with a single multivariate model rather than several.

    Best regards,

    -- Tom

    -------------------------------------------
    Thomas Sexton
    Professor and Associate Dean
    Stony Brook University
    -------------------------------------------








  • 3.  RE:Multiple Interaction Terms in a Model

    Posted 07-06-2012 09:52
    I think Thomas' statement is too broad. How we should model some relationship depends on what is known about the field and what our research questions are. There is a use for both exploratory and confirmatory modelling, and the distinction between them is not always crystalline.

    Further, there is (in many fields, anyway) rarely one "right" model; there are often several (or more) reasonable ones. This is probably least true in the "hard" sciences, where we may be modeling data to confirm a very specific hypothesis that is strongly based in theory.

    -------------------------------------------
    Peter Flom
    -------------------------------------------








  • 4.  RE:Multiple Interaction Terms in a Model

    Posted 07-06-2012 14:06
    I think it's critically important when considering models with interactions to understand the scientific questions to be asked, and to set up the analysis in advance to provide specific answers to those questions. This is probably easier in designed experiments, like clinical trials, than in observational studies. Two key choices to make that influence both analysis and interpretation of results: what is the reference group (intercept) and what comparisons will be pre-specified.

    Here's an example I advised on this week that may help to illustrate. Simple version summary: The experiment measured time people took to perform a visual task, varying two factors, each with three levels: the number of items on the screen, and the condition (simple, complicated one way, complicated even more). The participants also differ in the amount of imaging-based neuropathology identified. The questions of interest were: for people with minimal pathology, how does increasing the number of items affect time to completion? How does making the condition harder change time? Does the effect of number of items change when you are in a harder condition? And, of course, how does brain damage affect your overall performance - the "pure speed" test with an easy task, and as things get harder... Lots of effect modification here! Variance problems led to log transform of the time (we also looked at reciprocal but it didn't change findings materially.)

    The reference condition was thus chosen to be at a minimum pathology level (we took ln of the quantitative measure of pathology, no zeroes in dataset, so 1 went to 0 and became the reference - this turned out to be a reasonable choice from biological grounds). We defined variables for "item level" at 0, 1 and 2, making the lowest level be the reference, then had an additional variable for an indicator for the highest level, to see if the trend might be non-linear. We did pretty much the same thing for the conditions, since they were also designed to be progressively harder. Then there were main effects for various stuff, plus interactions between pathology measure and each of the factors, and the factor level trend with each other. The results were readily interpretable. For healthy-brain people, in the reference condition, increasing items didn't slow their performance, but as the condition got harder, not only did the task take longer with the fewest items, but increasing the number of items now made it harder. And as pathology increased, time took longer overall, and some interesting interactions with factors, which I won't detail. (There was also a non-linearity with increasing items - at some point a few more items just don't add to time exponentially.) 

    Key point to this lengthy example: choosing the reference values appropriately makes the model really nice. The zero main effect for items - at the reference level - is interpreted, and the interaction with condition difficulty makes sense too.

    -------------------------------------------
    Laurel Beckett
    -------------------------------------------