One situation where I think stepwise is fine is as follows. You have a dependent variable Y and an independent variable X which is of special interest. The question is, is Y related to X after you control for other relevant variables? In this case, it feels legitimate to me to build a model for Y with all of the other variables (not including X). You could use stepwise or some other method. This initial step is not intended to get "the model" (capital letters) or even to test anything about the variables that are thrown in the pot. X is then introduced to the model. If, in this new model, X is meaningfully related to Y (e.g., a non-trivial coefficient for X) and it is statistically significant, then that is evidence in favor of the hypothesis that the relationship of Y to X is real and not random. On the other hand, if the coefficient of X is small and/or it is not statistically significant, that does not show that Y is not related to X in reality. The stepwise procedure may have introduced variables that are correlated with both X and Y and elbowed out X.
Another case where I have used stepwise is to throw in all the independent variables, let them fight it out, and see how many variables survive. If none enter (forward selection) or none remain (backward selection), then I know that a more painstaking analysis of main effects is not likely to pay off.
In more than one of the comments offered during this thread, it was noted that some wisdom, intelligence and experience should be used in model-building. True! There are often choices that need to be made based on our collaborator's experience and not based on a particular statistical procedure.
Finally, I was surprised in looking through several good textbooks on regression that there was not enough space given to model-building (in my opinion.) It is such a critical area of statistics, and we are building models all the time. Perhaps one of the reasons that it was not covered so thoroughly is that there is an important subjective component to model-building. Yes, subjective! That is hard to put into a text.
May I say that I have not perused Harrell's text, oft-cited here, and I look forward to doing that, given the high marks for it offered on this forum
Your thoughts, colleagues?
Best wishes,
Nayak
-------------------------------------------
Nayak Polissar
Consultant
The Mountain Whisper Light
-------------------------------------------
Original Message:
Sent: 03-30-2012 18:50
From: Michael Chernick
Subject: MultiColinearity, Interactions, Quasi continuous data
I personally think stepwise procedures are fine. Any procedure that does a sequence of hypothesis tests has the multiple testing problem. My own research with Lacey Gunter does stepwise selection based on variables that qualitatively interact with treatment. We use stepwise procedures and adjust to control the FWER.
Regarding the first order interactions if you have 20 variables there are 380 possible pairwise interactions and if you test all of them you are bound to get some significant ones by chance. Also it is common to test for main effects first because a significant interaction without a main effect is difficult to interpret and may not be real.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------