Hello Everyone,
Excellent comments by all, to which I would heartily concur and only add: in my experience ( which eventually led to writing literally hundreds reports describing regression models to technical audiences ) my early efforts to describe those models typically included descriptions of the meanings of the equations, similar to Isabella's citation, because I too was following examples from textbooks. This was an attempt on my part to provide a connection for the reader between the regression coefficients and the graphical and tabular displays which followed in my reports, as well as to the written conclusions based on the regression. However, this effort at description typically only proved worth providing if the model was strictly linear-in-the-factors ( I.e. contained no interactions or higher order terms ), because beyond first order effects the terms must be described in combination with one another in order to capture the totality of a factor's net effect, and the resulting complexity of the written descriptions exceeded their value as aids to comprehension.
Which brings me to both an observation and a question: while describing an equation in words does connect the mathematical expression to other displays which are part of a complete written presentation, it only works in the mind of the reader if the model is very simple ( I.e. linear-in-the-factors ). This is fine for introductory situations or when linear-in-the-factors is the true model, but this writing practise can rapidly obscure the true situation when models grow more complex and begin to include higher order terms. Which leads me to also conclude, as Matthew has observed, that the value to the reader is mostly in the graphical and tabular displays and the written description is adding little value, and only under infrequently occurring conditions. Which begs the question: why do writers need to provide these descriptions at all? Aren't we as educators, researchers and authors training future writers of technical material involving regression models to provide a written construct which actually reduces reader comprehension or at least needlessly complicates the writing process as future models naturally become more complex?
I am also tempted to infer that written descriptions of the effects of individual factors in equations may be the source of much confusion in the non-scientific literature when these statements are later taken out of context and restated for other than their original audience, simply because they make good "soundbites."
Personally, I liked both Isabella's slightly more detailed corrected description both for it's improved clarity and the protection it provides against future over-simplification, and Ed's brief and correctly expressed description for it's simplicity and elegance. Both expressions were much more valuable to me as a reader than the original text. Clearly, if we as an organization are going to endorse the practice of providing written linkage between regression equations and other displays, especially in textbooks, then we also need to provide guidance for authors, reviewers and editors and to provide them with excellent examples such as those on this thread.
Tom
Thomas D. Sandry, PhD
Industrial Statistical Consultant, Retired
------------------------------
Thomas Sandry
------------------------------
Original Message:
Sent: 08-21-2019 20:05
From: Isabella Ghement
Subject: Interpretation of effects - should conditional qualifiers be stated explicitly?
Hi everyone,
I am reading a book on advanced regression models and the book states the fitted (linear mixed effects) model as:
Expected Cholesterol Level = 62.9465 + 30.6095 * female + 0.9378 * age - 0.9640 * month
Here, cholesterol stands for LDL (or "bad") cholesterol.
After noting that all the fixed effects predictors are (statistically) significant at the 0.05 level and commenting that only the variance of the random slope for month is significant at the 0.05 level (with the model also including a random intercept for subject), the book provides the following interpretation:
"On average, cholesterol level is 30.6095 points higher for female than male patients. As age increases by one year, the estimated mean LDL increases by 0.9203 points, and it decreases by 1.0957 points for every additional month in the study."
First Question: Have I gone mad or are the numbers listed in this interpretation totally off (except for the one for the female dummy variable)?
Second Question: Suppose the numbers in the above interpretation are actually matching those reported in the fitted model equation and the interpretation reads like this:
"On average, cholesterol level is 30.6095 points higher for female than male patients. As age increases by one year, the estimated mean LDL increases by 0.9203 0.9378 points, and it decreases by 1.0957 0.9640 points for every additional month in the study."
Wouldn't it be more appropriate to interpret the results along these lines:
"On average, among patients of the same age and who spent the same number of months in the study, cholesterol level is 30.6095 points higher for females than males patients. As age increases by one year, the estimated mean LDL increases by an estimated 0.9203 0.9378 points for patients having the same sex and the same number of months in the study; for patients with the same sex and same age, and it the mean LDL decreases by an estimated 1.0957 0.9640 points for every additional month in the study."
This way, the reader can clearly see that each of the reported effects is conditional on the values of the other predictors included in the model. If one leaves out these conditional qualifiers (highlighted in yellow in the above), is it automatically understood or implied that they should actually be taken into account by the reader? (Of course, leaving the qualifiers out could also wrongly imply that one did not adjust the reported effect for the effects of the other predictors in the model.)
One of the reasons I ask this question is because I often see statements like the one below when model results such as the above are being reported:
Being a female, being older and spending more months in the study are associated with higher mean LDL levels.
Again, this type of statement leaves out the conditional qualifier (e.g., being a female rather than a male but having the same age and spending the same number of months in the study) - am I wrong in thinking that this type of reporting is actually misleading?
Maybe I am complicating matters more than I should - I am curious to hear your thoughts.
Thanks,
Isabella
------------------------------
Isabella R. Ghement, Ph.D.
Ghement Statistical Consulting Company Ltd.
Web: www.ghement.ca
E-mail: isabella@ghement.ca
Phone: 604-767-1250
------------------------------