Discussion: View Thread

Explaining Random effects vs fixed effects in multilevel models

  • 1.  Explaining Random effects vs fixed effects in multilevel models

    Posted 12-28-2019 19:33
    Dear Colleagues,

    Can any one recommend a suitable manner of explaining random effects and fixed effects, as well as the basis for choosing the relevant variables for estimating these, to a client who has minimal knowedge of statistics?

    I need to explain the value of using a certain variable as the basis for estimating a random effect, rather than using it as the basis for estimating  a fixed effect in a model. 

    My explanation, thus far:
    Use of this variable  to estimate fixed effects would  lead to a large number of regression coefficients requiring estimation and this, I believe, would yield estimates that are not efficient -  i.e., variance values could be larger than they need to be -  if they can be arrived at at all.  This previous  sentence is a reasonable explanation but it might still be "above the client's head" (and may not  be aplicable to all situations for clustering variables).  So I wanted a more practical illustration using symbols from everyday life, if possible.  Does anyone have any ideas? 

    In addition, I wish to recommend to the client that we estimate  and examine predicted values of the outcome and random effects for each value of the clustering variable (used as the basis for the random effects) so that we could identify which clusters may  "contribute" more to the estimate of the outcome value. 

    Another question:  From your experience, what are the strengths and weaknesses of using a variable as a random effect and as a fixed effect in the same model?


    Any feedback is welcome.

    Regards

    Novie



    ------------------------------
    Novie Younger-Coleman
    Statistician
    Caribbean Institute for Health Research, UWI, Mona, Jamaica
    ------------------------------


  • 2.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 12-28-2019 19:48
    Andrew Gelman has a well written discussion of this on his blog. https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/ and in a paper in the Annals https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/

    ------------------------------
    Chris Barker, Ph.D.
    Consultant and
    Adjunct Associate Professor of Biostatistics


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    ------------------------------



  • 3.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 12-28-2019 20:22
    One of the easiest ways to explain it to a non-statistician, is that random effects change randomly and you cant control it. For a fixed effect, it's something that you, the researcher, change as needed. 

    Depending upon the circumstances, I'd say things like dose level, hospital, gender, are all fixed effects.... because you said you need a certain number of male and female participants, you have certain hospitals/clinics you work with, and dose levels because you need to know the effect of different doses of medication. 

    On the other hand, you patients might have different BMI, race, age, etc. Since you cant control that, they would be random effects.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 4.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 12-28-2019 20:39
    Following on Andrew's comment, in the design of the experiment, the fixed effects are at levels that you choose (and usually choose to be the same across other factors) while the random effects have values that represent a random sample from the population of values.  So, we are interested in means for fixed factors and variances for the random factors.  The behavior from random factors inflates the variation we see and we want to quantify that inflation.

    ------------------------------
    John Grice
    Six Sigma Statistician
    Honeywell International (ret.)
    ------------------------------



  • 5.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 12-29-2019 09:42
    ​Fixed vs. random effects is not as clear-cut as stated.  The key is you are making an inference to a population about some outcome.

    What is the population you are targeting?

    Example: If the population of interest is all teaching hospitals in the US, then the participant teaching hospitals hospital (assuming those in the study are considered to be a random sample of the population of interest) is a random effect.  If your inference is intended to apply only to the hospitals in your study, it is a fixed effect.

    Assuming that your inference on the main outcome is targeted to the general population, then covariates such as gender, race/ethnicity, and years of age are fixed effects.

    Fixed vs. Random is not necessarily about what you control or not in the study.

    Interest in the random effects is far more than merely their variability.  Interaction with fixed effects can be hugely informative.


    Jonathan J. Shuster, PhD
    Univ. of Florida




    ------------------------------
    Jon Shuster
    ------------------------------



  • 6.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 12-29-2019 13:59
    Thanks Jonathan. A follow-up to your post.

    Following the hospital example about random effects, say we wanted to make inference about gender and how it is associated with outcome of interest. Wouldn't gender also be random effect since we are not interested in making an inference to just the participants in our study but to all males and females in the population.

    With this logic it would seem that all effects in a model should be random. Am I missing something?

    Thank you!

    Please excuse autocorrect and typos






  • 7.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-01-2020 10:02

    Dear Colleagues,

    Thanks very much for all the responses.  

    I believe I can be guided  by selection of a variable as the random effect depending on whether its values (as they exist in the data set)  represent a random sample  from a poplulation of such values.

    I would not subscribe to gender being a random effect as there is a limit on the possible unique  gender "values" that exist  and for most analyses, there might be deliberate  restriction on the gender category for which analyses are  done or all possible categories are included in the analyses. 

    Ultimately, I believe there wil be a need for some model adequacy assessment to determine the variable that is best utilsed for estimation of the random effect.

    I also continue to assume that determination of the fixed effects can be guided by the research question. 


    Thanks, again, for all the responses. 

    Regards

    Novie



    ------------------------------
    Novie Younger-Coleman
    Statistician
    Caribbean Institute for Health Research, UWI, Mona, Jamaica
    ------------------------------



  • 8.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-01-2020 10:17
    I'm not on the experimental level of my colleagues here (my background is educational program evaluation) so I mostly lurk on the list to learn as much as I can.

    My question about random effects: Isn't it an assumption or requirement of these models that participants be randomly assigned to the levels of the variable? In that case gender cannot be a random effect; rather, it is a subject variable with inherent confounds. My understanding is that random effects involve the selection, and random assignment to, random levels of a larger independent variable. Fixed effects involve random assignment to specific levels of interest that are not randomly chosen from a larger set of possible levels.

    I appreciate clarification of this issue. Thank you for the discussion!

    ------------------------------
    Annette Gourgey
    CUNY
    ------------------------------



  • 9.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-01-2020 20:18
    What you are seeing here is how poorly defined certain terms are and how different perspectives of the same idea change the meaning of a given term. 

    Depending upon how you view a certain variable, you can make a valid argument for it to be either fixed or random. 

    I'd you look at my original definition, fixed effects can be changed by the researcher, random cannot, that would imply all variables in observational studies are random effects. Which, you could argue is true. 

    However, those that do onservational studies will cringe at that idea. Some will say, fixed effects are the variables of interest. Covariates, variables you might want to control for but are not part the study, would be random effects. 

    Others will argue that covariates are NOT random variables... or not always random variables. 

    Which brings us back to the original question, "What is a fixed and a random variable?" 

    Perhaps I should change my original definition. A fixed variable is one you can control. A fixed variable might be one that you are interested in.... or not. A random variable is one that you can't control. A variables that allows you some control, but not complete control and may or may not fit into other common uses... well, "Good Luck!"

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 10.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 12-30-2019 13:14
    I think it is time to have a serious discussion about this. The random/fixed paradigm is very useful when 1) the fixed effects are either categories or precise numerical levels and 2) there is a physical reason to assume that the random effects are not of interest. For example well-plates that contain multiple specimens with different treatments contribute a plate-to-plate variance component. But the plates and the processing of the plates are fairly standard.  When humans are the random effects, we expect that the study subjects are selected because of a common skill (think of machine operators in the well-plate experiment) or medical status (healthy, patient, sub-group).  

    I work in bio-processing development now and the paradigm is no longer working.  For example some of our biological effects can only be targeted, not precisely fixed. So the engineers can target levels of 15, 20 or 25 but get values of 14, 21, or 26.  This is a planned effect, not observational, but not a fixed effect either.  Also we obtain human tissue from donors so the donor should be a random effect.  But often we only use a few donors.   I started treating these as fixed effects because using random effects yields wide confidence interval estimates on the variance component.     I heard that some places don't use random effects until there are at least 6 donors or lots. I don't like heuristic rules but what is the value of a random effect estimate for a very low  n?  

    Georgette Asherman
    Senior Statistician





    ------------------------------
    Georgette Asherman
    ------------------------------



  • 11.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-02-2020 09:10
    Dear Novie,

    After all the responses are you more clear or less clear?  Jokes aside, this is a difficult concept to understand, let alone, explain.  I have pondered this topic for years.  You have received some good advice, but it is up to you to determine what is 'fixed' and what is 'random' according to your study and your plan for the results.  Fixed and random have nothing to do with what you can control or cannot control.  A fixed effect is an effect that can be estimated and is STABLE for some defined period in the future.  Design variables and covariates such as sex, age, race are very likely fixed biological effects.  However, as phsco/socio effects their time as fixed effects is limited because these effects can change.  If you perform a study in 3 laboratories and those 3 are 3 among hundreds in the world, then I would likely treat LAB as a random effect if my interest was in outcomes in other labs in the future.  However, if those 3 labs were the only 3 labs in the world, then I would likely treat LAB as a fixed effect, but only in the near term.  Why only the near term?  Because what is a lab?  The lab itself is not an effect.  It's the protocols, people, conditions, etc.  These may remain stable for a period of time, but will change at some point in the future.  Reproducibility in science is an issue, not only because it is difficult to spell, but because we either to not include possible random effects in the study and/or treat random effects as fixed effects.  You have to know WHY the variable has an effect to classify it as fixed versus random.  If Lab A is different from Lab B because Lab A is located in a dry climate and Lab B is located in a humid environment, then it is probably OK to call it a fixed effect.  Better yet, instead of LAB, use humidity as a covariate.  If, however, you do not know why the Labs get different results, and your interest is in Reproducibility, then it may be best to fit it as random.  Glassware in experiments is a good example.  I use 10 test tubes in my experiment.  Do I treat test tubes as fixed or random?  On the one hand, these particular test tubes will never be used again and in a future experiment all of the test tubes will be different.  This variable is not stable over time and is likely a random effect.  However, what if in the history of test tube making we know that the test tubes are extremely uniform.  Furthermore, when we run the experiment, there is a clear bias in 2 out of the 10 test tubes.  Within each test tube, all of the other variables have similar estimates as to their effect, but the results in these 2 test tubes just get higher results.  In this case, I would argue that the 2 test tubes in question are not random effects, but fixed effects.  I would make my best effort to research why these tubes were different, but even if I couldn't find a reason, I would still fit them as fixed.  Why?  Because of the implications in the analysis.  We would be biasing the estimates of the other variables by not accounting for this special cause.  OK, I've said enough.  I hope that I have not added to the confusion.  The point is that it is not easy and that you really have to think about each of your variables carefully before deciding.

    ------------------------------
    Philip R. Scinto
    Senior Fellow
    The Lubrizol Corporation
    ------------------------------



  • 12.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-02-2020 12:36

    A few additional comments on the fixed v. random issue.  As Schabenberger and Pierce (2002, Page 423) note, the problem goes back to Eisenhart's (1947) paper originating the terms and corresponding analyses, in which he suggested two competing criteria for making the selection.  One based on whether or not the levels/subjects in a factor were randomly selected and the other based on whether or not the inferences were extrapolations beyond the levels/subjects in the trial.  I agree with Schabenberger and Pierce's (2002, Page 423) counsel: if the levels/subjects for a factor are a true random sample there is little question the factor is random and if the levels/subjects are not a true random sample, but have a stochastic element, then the factor can be considered fixed or random depending on the context.  In fact, a factor may be fixed for some questions and random for others within the same trial.  Generalized linear, mixed model (GLMM) methods allow for this paradoxical situation by providing estimable (BLUE) functions for broad inference space conclusions (random) and predictable (BLUP) functions for narrow inference space conclusions (fixed) (Schabenberger and Pierce, 2002, Pages 423-424; Stroup, 2013, Pages 90-99).  For those who may be interested, GLMM also offers BLUP functions to make intermediate inference space conclusions about specific levels for some factors and random levels for others (Schabenberger and Pierce's, 2002, Page 424).  In balanced cases, the point estimates do not depend on the selection of fixed or random effects, but the associated variances do.  Thus, it is important to consider the number of levels required for sound estimates of the variance components.  In general, the more the better, but when only a few levels are available, I believe it is still wise to treat them as random if it is possible to show they are representative of the population to which inferences are being made.  To regard them as fixed almost certainly underestimates the variance or causes awkward limitations to the conclusions.

     

    In summary, it seems to me the goal of the fixed v. random exercise, like so much of statistics, is to ensure that our design and analysis adequately estimate the variances associated with our point estimates and/or hypothesis tests.

     

    References

     

    Eisenhart, C.  1947.  The assumptions underlying the analysis of variance.  Biometrics 3:1-21.

    Schabenberg, Oliver and F.J. Pierce.  2002.  Contemporary statistical models for the plant and soil sciences.  CRC Press. Boca Raton, FL, USA.

    Stroup, W.W.  2013.  Generalized linear mixed models:  Modern concepts, methods and applications.  CRC Press.  Boca Raton, FL



    ------------------------------
    Jon Baldock, Ph.D.
    Baldock Statistical Services
    jon@jbstats.com
    ------------------------------



  • 13.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-05-2020 08:12
    Dear Colleagues,

    Thanks, again, for all the responses.

    Inspired by your varied responses to my queries regarding random and fixed effects, I decided to give  an academic seminar based on the same topic and  I am inviting you to join.

    The Zoom link to the meeting is shown below. ********************************************
    Marshall Tulloch-Reid ( on behalf of Novie Younger-Coleman) is inviting you to a scheduled Zoom meeting.


    Topic: CAIHR Academic Seminar - "Random Effects vs Fixed Effects - to be or not to be"
    Time: Jan 6, 2020 03:15 PM (EST)

    Join Zoom Meeting
    https://uwi.zoom.us/j/106754667
    Meeting ID: 106 754 667
    ********************************************

    Hope you will be able to join. Your feedback will be valued highly.

    Regards

    Novie


    ------------------------------
    Novie Younger-Coleman
    Statistician
    Caribbean Institute for Health Research, UWI, Mona, Jamaica
    ------------------------------



  • 14.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-05-2020 19:21
    Novie will the webinar be recorded and available for viewing? 


    Please excuse autocorrect and typos






  • 15.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-06-2020 13:00
    Yes, it normally is recorded and can be shared for viewing afterwards.

    ------------------------------
    Novie Younger-Coleman
    Statistician
    Caribbean Institute for Health Research, UWI, Mona, Jamaica
    ------------------------------



  • 16.  RE: Explaining Random effects vs fixed effects in multilevel models

    Posted 01-02-2020 13:07
    Novie,

    You also asked about the advantages and disadvantages of including a variable as both a fixed and a random factor.  In think they are the same as for adding any factor - do you have sufficient number of levels/subjects to get a good estimate for its effect/variance and does it account for a large enough portion of the effect/variance to offset the additional complexity in the model.  In my field, agriculture, we sometimes include year as both a fixed and a random factor.  The fixed variable, say YR, accounts for the improvement in genetics and husbandry, which for corn is about 2 bushels/acre/year.  The random variable, say YEAR, primarily accounts for the variable weather conditions. 

    Regards,

    Jon

    ------------------------------
    Jon Baldock, Ph.D.
    Baldock Statistical Services
    jon@jbstats.com
    ------------------------------