A few additional comments on the fixed v. random issue. As Schabenberger and Pierce (2002, Page 423) note, the problem goes back to Eisenhart's (1947) paper originating the terms and corresponding analyses, in which he suggested two competing criteria for making the selection. One based on whether or not the levels/subjects in a factor were randomly selected and the other based on whether or not the inferences were extrapolations beyond the levels/subjects in the trial. I agree with Schabenberger and Pierce's (2002, Page 423) counsel: if the levels/subjects for a factor are a true random sample there is little question the factor is random and if the levels/subjects are not a true random sample, but have a stochastic element, then the factor can be considered fixed or random depending on the context. In fact, a factor may be fixed for some questions and random for others within the same trial. Generalized linear, mixed model (GLMM) methods allow for this paradoxical situation by providing estimable (BLUE) functions for broad inference space conclusions (random) and predictable (BLUP) functions for narrow inference space conclusions (fixed) (Schabenberger and Pierce, 2002, Pages 423-424; Stroup, 2013, Pages 90-99). For those who may be interested, GLMM also offers BLUP functions to make intermediate inference space conclusions about specific levels for some factors and random levels for others (Schabenberger and Pierce's, 2002, Page 424). In balanced cases, the point estimates do not depend on the selection of fixed or random effects, but the associated variances do. Thus, it is important to consider the number of levels required for sound estimates of the variance components. In general, the more the better, but when only a few levels are available, I believe it is still wise to treat them as random if it is possible to show they are representative of the population to which inferences are being made. To regard them as fixed almost certainly underestimates the variance or causes awkward limitations to the conclusions.
In summary, it seems to me the goal of the fixed v. random exercise, like so much of statistics, is to ensure that our design and analysis adequately estimate the variances associated with our point estimates and/or hypothesis tests.
References
Eisenhart, C. 1947. The assumptions underlying the analysis of variance. Biometrics 3:1-21.
Schabenberg, Oliver and F.J. Pierce. 2002. Contemporary statistical models for the plant and soil sciences. CRC Press. Boca Raton, FL, USA.
Stroup, W.W. 2013. Generalized linear mixed models: Modern concepts, methods and applications. CRC Press. Boca Raton, FL
------------------------------
Jon Baldock, Ph.D.
Baldock Statistical Services
jon@jbstats.com------------------------------
Original Message:
Sent: 01-02-2020 09:10
From: Philip Scinto
Subject: Explaining Random effects vs fixed effects in multilevel models
Dear Novie,
After all the responses are you more clear or less clear? Jokes aside, this is a difficult concept to understand, let alone, explain. I have pondered this topic for years. You have received some good advice, but it is up to you to determine what is 'fixed' and what is 'random' according to your study and your plan for the results. Fixed and random have nothing to do with what you can control or cannot control. A fixed effect is an effect that can be estimated and is STABLE for some defined period in the future. Design variables and covariates such as sex, age, race are very likely fixed biological effects. However, as phsco/socio effects their time as fixed effects is limited because these effects can change. If you perform a study in 3 laboratories and those 3 are 3 among hundreds in the world, then I would likely treat LAB as a random effect if my interest was in outcomes in other labs in the future. However, if those 3 labs were the only 3 labs in the world, then I would likely treat LAB as a fixed effect, but only in the near term. Why only the near term? Because what is a lab? The lab itself is not an effect. It's the protocols, people, conditions, etc. These may remain stable for a period of time, but will change at some point in the future. Reproducibility in science is an issue, not only because it is difficult to spell, but because we either to not include possible random effects in the study and/or treat random effects as fixed effects. You have to know WHY the variable has an effect to classify it as fixed versus random. If Lab A is different from Lab B because Lab A is located in a dry climate and Lab B is located in a humid environment, then it is probably OK to call it a fixed effect. Better yet, instead of LAB, use humidity as a covariate. If, however, you do not know why the Labs get different results, and your interest is in Reproducibility, then it may be best to fit it as random. Glassware in experiments is a good example. I use 10 test tubes in my experiment. Do I treat test tubes as fixed or random? On the one hand, these particular test tubes will never be used again and in a future experiment all of the test tubes will be different. This variable is not stable over time and is likely a random effect. However, what if in the history of test tube making we know that the test tubes are extremely uniform. Furthermore, when we run the experiment, there is a clear bias in 2 out of the 10 test tubes. Within each test tube, all of the other variables have similar estimates as to their effect, but the results in these 2 test tubes just get higher results. In this case, I would argue that the 2 test tubes in question are not random effects, but fixed effects. I would make my best effort to research why these tubes were different, but even if I couldn't find a reason, I would still fit them as fixed. Why? Because of the implications in the analysis. We would be biasing the estimates of the other variables by not accounting for this special cause. OK, I've said enough. I hope that I have not added to the confusion. The point is that it is not easy and that you really have to think about each of your variables carefully before deciding.
------------------------------
Philip R. Scinto
Senior Fellow
The Lubrizol Corporation
Original Message:
Sent: 12-28-2019 19:33
From: Novie Younger-Coleman
Subject: Explaining Random effects vs fixed effects in multilevel models
Dear Colleagues,
Can any one recommend a suitable manner of explaining random effects and fixed effects, as well as the basis for choosing the relevant variables for estimating these, to a client who has minimal knowedge of statistics?
I need to explain the value of using a certain variable as the basis for estimating a random effect, rather than using it as the basis for estimating a fixed effect in a model.
My explanation, thus far:
Use of this variable to estimate fixed effects would lead to a large number of regression coefficients requiring estimation and this, I believe, would yield estimates that are not efficient - i.e., variance values could be larger than they need to be - if they can be arrived at at all. This previous sentence is a reasonable explanation but it might still be "above the client's head" (and may not be aplicable to all situations for clustering variables). So I wanted a more practical illustration using symbols from everyday life, if possible. Does anyone have any ideas?
In addition, I wish to recommend to the client that we estimate and examine predicted values of the outcome and random effects for each value of the clustering variable (used as the basis for the random effects) so that we could identify which clusters may "contribute" more to the estimate of the outcome value.
Another question: From your experience, what are the strengths and weaknesses of using a variable as a random effect and as a fixed effect in the same model?
Any feedback is welcome.
Regards
Novie
------------------------------
Novie Younger-Coleman
Statistician
Caribbean Institute for Health Research, UWI, Mona, Jamaica
------------------------------