Ryan:
I am an economist who frequently has to build forecast models. I learned a long time ago that predicting the future growth rate of any economic variable is a fool's game. Instead I use multiple (nonlinear) regression to generate forecast models for economic variables (GDP, company sales, productivity and hundreds of others). Multicollinearity is almost always a problem, and also almost always a problem that has to be ignored to some extent. All industries, and all of the variables associated with them, are directly connected to the larger economic cycle. I frequently find myself in a position where the best predictors of a given variable are so highly correlated with each other that I can't use more than one of them. So, when I have multiple predictor variables that consistently reach cyclical peaks and troughs prior to the dependent variable I find it difficult to justify excluding them from the model simply because of a higher than desired P-value. Reminding myself that probability is not equal to reality helps.
What helps more is visualization. When a graph of the variables shows that one of them is a predictor of the other, but a low R squared says it isn't, I generally conclude that the method of least squares is telling me more about a disparity between long-run trend and the short-run predictive consistency at cyclical turns than it is about whether the variable is useful to me as a leading indicator. Useful leading indicators are too valuable to ignore simply because a measure of probability says they aren't useful. In other words, I let the graph tell me if a low R squared is a false negative.
I realize my comments aren't specific to your email, but modelling is as much art as it is science. While it seems obvious to me that lying with statistics should be prevented to the extent possible, I remain just as concerned that sometimes the statistic is the lie. To be terminologically correct, P-values (and any other probabilistic measure of error) are always reliable, but sometimes they are not valid. To close, I think what you're asking is where do we draw the line. For me, the line is 5 independent variables or less.
Happy holidays, Steve L-K
Stephen Latin-Kasper
Economist / CEO
414-779-0886
stephen@coherentforecasting.org