I hope you are all off to a lovely and enjoyable Fall.
A couple of questions have emerged from projects I am working on. I will list the questions here for your input, while also trying to provide some brief background.
For this question, I am working with a multilevel data set which contains multiple sites (level 2); each site contains multiple groups of animals (level 1); each group contains several focal animals (level 0).
The response variable is measured once per focal animal and is expressed as a proportion of time the focal animal is vigilant over a pre-set observation period.
One of the predictor variables (rut phase) tracks if the group of animals is in a rut or pre-rut phase. On the surface of it, this variable should be a group-level variable expressed at level 1 of the data hierarchy.
However, because different groups are observed in different years even at the same site, rut phase ends up being a mixture of level 2 and level 1 predictor (i.e., a "mixed-up" predictor). In the data, one site has only pre-rut phase as the predictor value for all of its groups; a few others have only rut phase; the majority have both pre-rut and rut phases represented.
Since I have not worked with "mixed-up" predictors before, I am wondering: Is it ok to try and fit the model with this predictor "as is" or do I have to try and re-express it somehow so that it becomes a level 2 predictor (with level 2 being the highest level of the data hierarchy)?
In general, what is the best way to handle these "mixed-up" predictors in the context of multi-level models? Especially in a situation like this, where the predictor is located smack in the middle of the data hierarchy (as opposed to the bottom).
For this question, assume that we have a 2-level data hierarchy - children nested in schools - and that for each child we measure the response variable of interest just twice: before and after some event. The response variable could be assumed to be continuous (although in my project it is a proportion).
Because we only have 2 response values per child, it is my understanding that we are constrained to consider a mixed effects models which includes only a random intercept if we try to relate the response variable to the predictor variable "occasion" (i.e., before vs. after). Is that correct? (Someone on Twitter mentioned a while ago that we could actually consider a random intercept and a random slope for the "occasion" predictor as long as they are uncorrelated.)
The "random intercept only" constraint doesn't quite go far in capturing the patterns seen in the data, where the lines connecting the observed response values between the two occasions, before and after, do NO all go into the same direction (as would be the case if we do NOT allow for a random slope).
Is it me or the "random intercept only" model is doomed from the start? What are the modelling alternatives here, if any, to try to get better alignment between what is seen in the data and what is allowed for in the model?
Thanks and I look forward to whatever thoughts you are able to share.
To Q1, an explanatory variable that varies across L0 units within L1 clusters would be an L0 variable, even if it takes a single value for all units in some L1 clusters. But with SAS and R, at least, the level of a fixed effect variable doesn't need to be specified, you can just include this variable like the others.
Thanks very much for your answer, Vince. I've just sent you an email at the email address you provided. If you have any thoughts on Q2, I would love to hear those at well.