ASA Connect

 View Only
  • 1.  Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-21-2019 20:05
    Hi everyone, 

    I am reading a book on advanced regression models and the book states the fitted (linear mixed effects) model as: 

      Expected Cholesterol Level = 62.9465 + 30.6095 * female  + 0.9378 * age - 0.9640 * month 

    Here, cholesterol stands for LDL (or "bad") cholesterol.  

    After noting that all the fixed effects predictors are (statistically) significant at the 0.05 level and commenting that only the variance of the random slope for month is significant at the 0.05 level (with the model also including a random intercept  for subject), the book provides the following interpretation: 

    "On average, cholesterol level is 30.6095 points higher for female than male patients.  As age increases by one year, the estimated mean LDL increases by 0.9203 points, and it decreases by 1.0957 points for every additional month in the study." 

    First Question:  Have I gone mad or are the numbers listed in this interpretation totally off (except for the one for the female dummy variable)?
     

    Second Question:  Suppose the numbers in the above interpretation are actually matching those reported in the fitted model equation and the interpretation reads like this: 

    "On average, cholesterol level is 30.6095 points higher for female than male patients.  As age increases by one year, the estimated mean LDL increases by 0.9203  0.9378 points, and it decreases by 1.0957 0.9640 points for every additional month in the study."

    Wouldn't it be more appropriate to interpret the results along these lines: 

    "On average, among patients of the same age and who spent the same number of months in the study,  cholesterol level is 30.6095 points higher for females than males patients.  As age increases by one year, the estimated mean LDL increases by  an estimated 0.9203  0.9378 points for patients having the same sex and the same number of months in the study;  for patients with the same sex and same age, and it  the mean LDL decreases by an estimated 1.0957 0.9640 points for every additional month in the study."

    This way, the reader can clearly see that each of the reported effects is conditional on the values of the other predictors included in the model.  If one leaves out these conditional qualifiers (highlighted in yellow in the above), is it automatically understood or implied that they should actually be taken into account by the reader?  (Of course, leaving the qualifiers out could also wrongly imply that one did not adjust the reported effect for the effects of the other predictors in the model.)  

    One of the reasons I ask this question is because I often see statements like the one below when model results such as the above are being reported: 

    Being a female, being older and spending more months in the study are associated with higher mean LDL levels.  

    Again, this type of statement leaves out the conditional qualifier (e.g., being a female rather than a male but having the same age and spending the same number of months in the study) - am I wrong in thinking that this type of reporting is actually misleading? 

    Maybe I am complicating matters more than I should - I am curious to hear your thoughts.  

    Thanks, 

    Isabella

    ------------------------------
    Isabella R. Ghement, Ph.D.
    Ghement Statistical Consulting Company Ltd.
    Web: www.ghement.ca
    E-mail: isabella@ghement.ca
    Phone: 604-767-1250

    ------------------------------


  • 2.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-22-2019 11:02

    For a layperson audience, I've always struggled with wording the interpretation of regression coefficients such that the laypersons can correctly understand the coefficients. For an audience with advanced regression knowledge, why do you need to state anything more than just the formula? And for a publication, I've seen everything under the sun.

    https://en.wikipedia.org/wiki/Linear_regression#Interpretation



    ------------------------------
    Matthew Robinson
    ------------------------------



  • 3.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-23-2019 12:10
    Isabella:  Excellent and appropriate wording.  I wonder if a group in ASA could (if such a group doesn't already) facilitate participation of statisticians as reviewers for subject-matter journals.  I occasionally review articles in wildlife journals but I could certainly do more.  While there are subject-matter folks who do (really) know statistics, there aren't enough of them.  Doing so might help with our ongoing identity crisis (which I define as "Everyone with a computer thinks they're a statistician").

    Matthew:  While giving the equation is essential, I would also ask for a measure of prediction precision, standard errors of the estimated parameters, and maybe even the parameter covariance matrix if not in the text but at least in an appendix or supplement.  Also, more journals allow for interactive graphics could greatly help with the interpretation.  Below is an example based on the regression that Isabella provided:

    Cholesterol levels predicted by age and gender.


    ------------------------------
    Jim Baldwin
    Retired
    ------------------------------



  • 4.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-26-2019 06:46
    I think confidence intervals are more informative than standard errors for the regression coefficients. I don't think the covariance matrix should be included in main text or tables, I would stick to reporting directly interpretable information, but it could well be included (with full precision) in an electronic supplement if raw data is not public (as it usually cannot be when data are on on persons). I agree that interaction plots are valuable in regression models including interaction.

    ------------------------------
    Tore Wentzel-Larsen
    Researcher
    Norwegian Centre for Violence and Traumatic Stress Studies,
    Regional Center for Child and Adolescent Mental Health, Eastern and Southern Norway]
    ------------------------------



  • 5.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-26-2019 09:53
    I recognize that including the parameter covariance (or correlation) matrix sounds a bit overboard but it is just that I've seen more than a few analyses where the model is overparameterized and looking at the correlation matrix is one way to detect that.  That situation usually only happens when there's a nonlinear model based on some physical theory where just looking at correlations among predictors isn't sufficient.  Also, storage space in supplements is now cheap and likely getting cheaper.

    ------------------------------
    Jim Baldwin
    Retired
    ------------------------------



  • 6.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-23-2019 07:40
    No, I don't think you have gone mad! Looks like some careless typos.

    I think your longer statement is more accurate, but may be unduly long. I might say, "With the other variables statistically held constant, on average, cholesterol level is 30.6095 points higher for female than male patients, increases by 0.9203  0.9378 points as age increases by one year , and decreases by 1.0957 0.9640 points for every additional month in the study." 

    Or something like that.

    Ed


    ------------------------------
    Edward Gracely
    Drexel University
    ------------------------------



  • 7.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-27-2019 13:12
    Hello Everyone,

    Excellent comments by all, to which I would heartily concur and only add: in my experience ( which eventually led to writing literally hundreds reports describing regression models to technical audiences ) my early efforts to describe those models typically included descriptions of the meanings of the equations, similar to Isabella's citation, because I too was following examples from textbooks. This was an attempt on my part to provide a connection for the reader between the regression coefficients and the graphical and tabular displays which followed in my reports, as well as to the written conclusions based on the regression. However, this effort at description typically only proved worth providing if the model was strictly linear-in-the-factors ( I.e. contained no interactions or higher order terms ), because beyond first order effects the terms must be described in combination with one another in order to capture the totality of a factor's net effect, and the resulting complexity of the written descriptions exceeded their value as aids to comprehension.

    Which brings me to both an observation and a question: while describing an equation in words does connect the mathematical expression to other displays which are part of a complete written presentation, it only works in the mind of the reader if the model is very simple ( I.e. linear-in-the-factors ). This is fine for introductory situations or when linear-in-the-factors is the true model, but this writing practise can rapidly obscure the true situation when models grow more complex and begin to include higher order terms. Which leads me to also conclude, as Matthew has observed, that the value to the reader is mostly in the graphical and tabular displays and the written description is adding little value, and only under infrequently occurring conditions. Which begs the question: why do writers need to provide these descriptions at all? Aren't we as educators,  researchers and authors training future writers of technical material involving regression models to provide a written construct which actually reduces reader comprehension or at least needlessly complicates the writing process as future models naturally become more complex?

    I am also tempted to infer that written descriptions of the effects of individual factors in equations may be the source of much confusion in the non-scientific literature when these statements are later taken out of context and restated for other than their original audience, simply because they make good "soundbites."

    Personally, I liked both Isabella's slightly more detailed corrected description both for it's improved clarity and the protection it provides against future over-simplification, and Ed's brief and correctly expressed description for it's simplicity and elegance. Both expressions were much more valuable to me as a reader than the original text. Clearly, if we as an organization are going to endorse the practice of providing written linkage between regression equations and other displays, especially in textbooks, then we also need to provide guidance for authors, reviewers and editors and to provide them with excellent examples such as those on this thread.

    Tom

    Thomas D. Sandry, PhD
    Industrial Statistical Consultant, Retired

    ------------------------------
    Thomas Sandry
    ------------------------------



  • 8.  RE: Interpretation of effects - should conditional qualifiers be stated explicitly?

    Posted 08-27-2019 20:02

    Some minor points that will be news to almost nobody on the list :-) ... (1) Isabella's phrasing is fine for a small and simple model but, as the number of variables increases, it would be necessary to move to a more generic phrase such as "on average, all else equal." (2) Again in more complex settings, any written or graphical description of the model should generally be limited to the 1-3 key explanatory variables of interest, not every coefficient needs to be explained. (3) Extra trivial but it's possible that the effect of, say, sex is independent of age and months in the program and this likely hasn't been tested or assessed.

    I am somewhat sympathetic to the original authors. Including that phrasing in every sentence is tiresome for the reader so finding a way to phrase an introductory sentence that expresses "Keeping in mind that all of these effects I am about to highlight are conditional on the other variables in the model ..." would work. I also wonder if there may be scenarios where it is more confusing than clarifying. The conditional (or independent) effect of, for example, sex is the same regardless of the values of the other two variables. "All else equal" is a reminder that you should not compare a 60-year-old female with a 55-year-old male and ascribe all of that difference to the male-female difference -- which is a pretty straightforward point and one that was made in justifying the model. In a scenario where this might be the first, simplest model that you fit in a hierarchical series of models that involve sex*age interactions, etc. then "this is the effect of sex for participants that are both 60 (controlling for everything else in the model)" is very important to understand but maybe not clearly different to the reader than "this is the effect of sex for participants that are the same age."

    I like the interactive graphic. Expanding it to models with more variables can be challenging of course. 



    ------------------------------
    Walter Davis
    Senior Research Fellow
    PowerLab
    ------------------------------