ASA Connect

 View Only
  • 1.  Question on interpreting a ZOIB models with random effects

    Posted 05-24-2022 17:47
    Hi everyone, 

    Long time, no see - I hope everyone is doing as well as possible. 

    Lately, I've been wrestling with some zero-and-one-inflated beta regression models which include random effects and am not sure whether my interpretation of the models is correct, so I am looking for some confirmation whether or not I am on the right track. These models are more complicated since they have 4 different components - mu (mean), phi (dispersion), zoi and coi.   

    To keep things simple, let's say that my first model, model1, is fitted like this using the brms package: 

    ```
    library(brms)
    model1 <- brm(
                   formula = bf( student_response ~
    1 + district_type + (1 | district/school),   # mu 
                                         phi ~
    1 + district_type + (1 | district/school), 
                                          zoi ~
    1 + district_type + (1 | district/school), 
                                          coi ~
    1 + + district_type + (1 | district/school)),
                    family = zero_one_inflated_beta(), ... )
    ```

    where we have students nested in schools nested in districts and the response variable is measured at the student level and is a proportion which can take any value between 0 and 1, including 0 and 1. 

    It is my understanding that the first model component, mu, models the logit-transformed mean value of the response variable as a function of the district-type predictor and the school within district and district random effects,  but only for those students whose responses are NOT 0 or 1.  The second model component, phi, models the variability in these response variable values about mu as a function of the same.  Is my understanding correct?  

    What throws me off is the fact that district_type (let's say, Large versus Small) is a district-level variable in this model, so interpreting its effect seems trickier.  I can't say things like "for the typical school nested inside the typical district", because that would preclude district taking two possible values, Large or Small, at the same time.

    But can I say something like:  If we compare the logit-transformed mean response values for students in two districts such that one district is Large and the other is Small but which both have the same value for the random effect of District and contain schools with the same values for the random effect of school, then the difference in the logit-transformed mean response values for these students is captured by the slope of district_type in the mu component of the model? (None of the students in question provided response values of 0 or 1; only values in (0,1).) 

    I am just looking for an interpretation of the slope of district_type in the mu component that I can live with though this seems hard to pull off and translate in simple words.  Maybe I can simplify this further to: 

    If we compare the logit-transformed mean response values for students in two middle-of-the-pack districts such that one district is Large and the other is Small  and contain middle-of-the-pack schools, then the difference in the logit-transformed mean response values for these students is captured by the slope of district_type in the mu component of the model? (None of the students in question provided response values of 0 or 1; only values in (0,1).) 

    As an aside, if mu is the expected value of a "discrete" (rather than "continuous") non-zero and non-one proportion, does it make sense to use terminology like "odds" when describing what the exponentiated value of the slope of district_type means in the mu component? 

     When combining the four model components (i.e., mu, phi, zoi and coi), one can get the expected value of the student response variable, regardless of whether that response was 0, 1 or something in-between.  What is the best way to describe the meaning of that expected response value?  Can we now say it represents the expected response value for students in a middle-of-the-pack school (i.e., a school with a random school effect equal to 0) located in a middle-of-the-pack district (i.e., a district with a random district effect equal to 0), regardless of whether their responses were equal to 0, 1 or something in-between? 

    My second ZOIB model looks something like this: 

    ```
    library(brms)
    model2 <- brm(
                   formula = bf( student_response ~ 1 + student_status + (1 | school),   # mu 
                                         phi ~ 1 + student_status + (1 | school), 
                                          zoi ~ 1,  
                                          coi ~ 1),
                    family = zero_one_inflated_beta(), ... )
    ```
    where now there are just students nested inside schools, student_status is a binary predictor (let's say: good student vs problematic student) and there are twoo few 0's and 1's to go fancy with modelling the zoi and coi's components.

    So, for model2, can I interpret the effect of student_status for the mu component by comparing the "good students" with the "problematic students" in the middle-of-the-pack school (i.e., a school with a random school effect of 0) in terms of the logit-transformed value of their mean responses, assuming all of these responses where not 0 and not 1? 

    And can I talk about the expected value of the student-response variable for the students in middle-of-the pack school, regardless of whether their response was 0, 1 or something in-between 0 and 1?  

    My third model is a bit more complicated, as it now includes a smooth term of a school level variable (say, school revenue) for each level of the two levels of student_status inside its mu component: 
     
    ```
    library(brms)
    model2 <- brm(
                   formula = bf( student_response ~ 1 + student_status + s(school_revenue, by = student_status) + (1 | school),   # mu 
                                         phi ~ 1 + student_status + (1 | school), 
                                          zoi ~ 1,  
                                          coi ~ 1),
                    family = zero_one_inflated_beta(), ... )
    ```


    How on earth do I interpret the effect of student_status in the mu component now?  I guess I can interpret this effect at different values of school_revenue? Something like:

    Comparing "good students" with "problematic students" at a middle-of-the-pack school which has a particular school_revenue value, the effect of student_status is given by ____ (what?). (I am thinking I need to compute some kind of marginal effect of student_status for the mu component?)

    Thank you in advance for any tidbits of insights you will be able to throw my way.  

    ​Isabella

    ------------------------------
    Isabella R. Ghement, Ph.D.
    Ghement Statistical Consulting Company Ltd.
    Tel: 604-767-1250
    E-mail: isabella@ghement.ca
    ------------------------------


  • 2.  RE: Question on interpreting a ZOIB models with random effects

    Posted 06-27-2022 14:06
    Edited by Isabella Ghement 06-27-2022 14:11
    There has been no response to my questions - either the answers are obvious or there are more difficult than meets the eye.  I will try my luck on a different forum and will come back here in case I get any updates there: 

    Question on interpreting ZOIB models with random effects - data analysis / models - Datamethods Discussion Forum

    These model interpretation questions are the ones that applied practitioners such as myself tend to struggle with the most. 

    Thank you!

    Isabella

    ------------------------------
    [Isabella] [Ghement][Ghement Statistical Consulting Company Ltd.]
    ------------------------------



  • 3.  RE: Question on interpreting a ZOIB models with random effects

    Posted 06-30-2022 07:29
    Hey Isabella,
    I suspect that the reason is the latter!
    So...don't be discouraged, seems like you're working on a non-obvious problem.
    Some ideas to move ahead....

    [0] Have you reached you to the authors of the brms library?
    They might be well-positioned to help.

    [1] Have you reached out to the authors of any of the papers cited by the brms model documentation in the brms library?
    I.e., the brms package should have a documentation PDF. Find the brms model in the documentation and see who's paper they cite as the basis for their model. Email those authors with your question.
    With any luck there might even be some helpful description about the parameters in the documentation that answers your question.
    Or the referenced paper might have the answer.

    [2] Have you popped your question in the Young Professionals Section forum? The Social[?] Statistics section forum?
    I've found that the specialized sections tend to have better response rates.
    YP Section has the students & post docs who are working on a wide variety of models. They might also have the time and be more keen to dig into an unfamiliar area. 

    [3] Can you write it up as a blog post and submit it to the Andrew Gelman blog?
    I bet a couple hundred people in that community will know the answer.

    Good luck!

    Best,
    Glen




















    ------------------------------
    Glen Wright Colopy
    DPhil Oxon
    The Data & Science Podcast / LifeBell AI
    ------------------------------