Discussion: View Thread

  • 1.  Variable Selection for GEE model

    Posted 11-04-2013 16:33
    This message has been cross posted to the following eGroups: Young Professionals Group and Statistical Consulting Section .
    -------------------------------------------
    Dear All,

    I am fitting GEE model to my data. Let's say Y is my dependent variable and X1, X2, X3, X4, X5, X6 are independent variables in the model.
    • If I use X1as the sole predictor it is significant , similar is the case with X2 and X3 and the model QIC value for each of these models is 574
    • In the model with X1-X6 only one of the three (X1,X2, X3) is significant (p-value 0.03) and there is another lets say X5 (0.04) which is significant with QIC =584.
    • However, if I keep X1, X2, X3 and X5 nothing is significant, QIC =577.
    Does anyone know any new developments in the variable selection techniques for GEE. I understand there is an extension to Mallows Cp statistics for GEE but it hasn't been executed in any statistical softwares and there is lot of negativity surrounding automated variable selection. 

    I would really appreciate if someone can point out what are the best techniques to identify the best fit model (other than just comparing QIC, I feel there is something more going on).

    PS: to avoid multi-colinearity between X1, X2 and X3 I have centered them and my response is also centered around it's mean.

    Thank you for going through my post and I look forward to the comments/suggestions from the group members.

    Best Regards,
    Tasneem

    -------------------------------------------
    [Tasneem] [Zaihra]
    [Post Doctoral Fellow]
    [McGill University]
    -------------------------------------------


  • 2.  RE:Variable Selection for GEE model

    Posted 11-04-2013 17:14
    OK, you centered the variables. That does not eliminate collinierity, just reduces certain kinds of collinierity. Have you examined the VIF for the variables? Have you examined the covariance pattern for X1, X2, X3? It certainly sounds like a collinierity problem of some kind.

    -------------------------------------------
    Paul Thompson
    Director, Methodology and Data Analysis Center
    Sanford Research/USD
    -------------------------------------------








  • 3.  RE:Variable Selection for GEE model

    Posted 11-05-2013 12:05
    In addition to collinearity, you might consider the scientific relationships among these 6 covariates, as in any modeling project. For example: What happens in a model with just X1, X2, and X3, all of which are individually statistically significant? What is the relationship between X5 and the two variables among X1, X2, X3 that are not significant when you fit the model with all 6 independent variables? How do X4 and X6 relate to these variables?

    -------------------------------------------
    Alicia Toledano
    President
    Biostatistics Consulting, LLC
    -------------------------------------------








  • 4.  RE:Variable Selection for GEE model

    Posted 11-05-2013 11:51
    Full disclosure: I'm fairly suspicious of automated or algorithmic approaches to variable selection.
    I don't believe centering can "fix" a confound or collinearity...except under vary particular circumstances.

    There are scenarios where one algorithm or another happens to be structured in a way that appropriately gets you through the phenomenon on which you're working...that is, the algorithms ARE taking a particular approach, envisioned by their creator(s). When you choose the algorithm, you choose their approach.

    I tend to think of it as parking your car on a hill, putting it in neutral, and getting out. You may not be "driving", but you're still responsible for understanding the predictable events you set in motion...and for that car landing in someone's living room.

    I tend to approach the problem more mechanistically. What are X1-X6? What is Y?

    Are X1-X3 imperfect measures of the same construct (simple: use average, complex: latent trait)
    Do X1-X3 represent items in a potential causal chain? (simple: use the one most proximal to Y, complex: path/SEM/causal)
    ...and so on.


    -------------------------------------------
    Jason T. Machan
    Director, Lifespan Biostatistics Core,
    Lifespan Hospital System
    Research Scientist, Biostatistics, Research
    Rhode Island Hospital
    Assistant Professor, Departments of Orthopaedics and Surgery
    The Warren Alpert Medical School, Brown University
    Director Biostatistics Externship, Adjunct Assistant Professor, Department of Psychology
    University of Rhode Island

    -------------------------------------------








  • 5.  RE:Variable Selection for GEE model

    Posted 11-05-2013 12:12
    Centering the X's especially helps if you are dealing with polynomial terms in the X's.  I don't see much value in centering the y's, indeed it may change the shape of the outcome dramatically if the y is non-normal and require you to use a different model.
    Clearly what you have is co-linear variables. You ought to start with the most important variable
    (clinically or statistically - often clinically is critical, depending on what you want to do).  and then step up
    (you may have to do it by hand - by hand stepping down is usually easier.)
    It is important to realize that because of the co-linearity you will probably have many equivalent models that predict
    the results equally well.  There is no unique model, so you need to take into account which of them are the most relevant and work from there.
      I would recommend Fitzmaurice, Laird and Ware's book (2nd ed) as a reference.
    You don't say what distribution Y has, which is critical. 
    I also don't use GEEs that much any more, GLMMs allow better parameterizations, especially for the variances and covariances.

    Ray

    -------------------------------------------
    Raymond Hoffmann
    Professor of Biostatistics in Pediatrics
    Medical College of Wisconsin
    -------------------------------------------