Discussion: View Thread

  • 1.  multilevel trial analysis plan second opinion

    Posted 05-25-2023 17:02

    Hi All,

    Apologies for the long message. I have a few questions where I could use additional advice/opinions about the analysis plan for a study I am on.

    As background, it's a community-based intervention where we are testing if vaccination rates increase at a collection of 8 health care systems (highest level), each of which have 7-8 clinics, each of which see thousands of patients. The original design was a stepped wedge with 1 step, so 4 systems were assigned to receive the intervention first, followed by the remaining 4 systems later. As part of the original design, it was well-powered to detect changes in vaccination rates measured at the individual (deidentified) patient level. In addition to the intervention, the PIs wanted to explore various mediators and moderators. The original plan was thus to use a generalized linear mixed model treating the individual outcomes as binary and of course accounting for the different time points (5) and nested structure (people within clinics within system). 

    Since then, I left the institution shortly before grant was funded, but I have kept in some contact with the current statistician, who has filled in and informed me of some changes presented to us. (This has all come back because I am included on a paper about the design and methods that they intend to submit soon.) Briefly, we were told that the systems may be unable or unwilling to provide the team with individual patient level data. At one point, to the chagrin of all of us, we were told that the outcome (vaccination rates) it would be measured at the system level, collapsed across the other levels, meaning we would have only 8 individual measurements! (Note that these rates are between 0 and 1 but no longer binary.) The statistician and I agreed this is not tenable (model will not be identifiable with only 8 points and multiple covariates), so I plan to inform them that this won't work and we need a new plan, either to get data at the patient level as planned (statistically ideal but maybe not feasible in practice), or compromise at the clinic level (~60 clinics). Thus my questions are the following.

    1) The other statistician had an interesting idea. If we have proportions (vaccination rates, between 0 and 1) at the clinic levels, e.g. p1, p2, ... p60 at a given time point, and we know the clinic sizes (n1 patients in clinic 1, n2, ... n60 patients in clinic 60), then he suggested we generate pseudo observations of for example n1p1 observations with the outcome of interest (1s) and similarly n1(1-p1) without the outcome of interest (0s). These should correspond to deidentified patient level outcomes (some vaccinated:1, some not:0). Then he suggested running a model on the pseudo data where we could include clinic level covariates or system level covariates only. (Obviously any patient level data is masked and lost, e.g. demographics, but clinic level or higher data should be preserved.) My question is, has anyone done this? Is it valid? It seems logical but feels weird. Does anyone have any experience or thoughts either way? If this is valid, it would probably be the easiest and most powerful option.

    2) Suppose we have only clinic level data (60 clinics, 5 time points, outcomes are then rates between 0 and 1). We could leave it as such or even simplify to just 2 time points (pre-intervention, post-intervention) and possibly collapse further by just subtracting (post-pre). In either case, what would be the best way to model these rates? One suggestion was to take the log odds and simply use a linear model (or linear mixed model) on log-odds. Another was to use probit regression (is a mixed version possible?). Another was to use beta regression (or mixed?). Yet another was a quasi-binomial model, which I hadn't heard of previously but sounded interesting. Does anyone have experience with such data and/or know what type of model might be best in these kinds of situations? (That is, collapsed proportions for each of 60 clinics, with at least one, and possibly a handful of additional covariates of interest in this secondary aim.)

    Our goal is to come up with a strategy to propose to this group before we meet on Tuesday.

    Thanks for reading, and any help on either or both questions would be appreciated!

    Naomi



    ------------------------------
    Naomi Brownstein
    Associate Professor
    Medical University of South Carolina
    ------------------------------


  • 2.  RE: multilevel trial analysis plan second opinion

    Posted 05-25-2023 20:08

    Hi Naomi.
    I've used your Option 1 before. I'm not sure I'd call the binary outcomes you're describing "pseudo" data because you know the counts of 0s and 1s for each site. In the absence of other patient-level data, the order of the 0s and 1s doesn't matter. This approach allows you to fit a mixed logistic regression model or alternative (e.g., beta-binomial), taking clustering within clinic into account.
    It's trickier with multiple time points because you can't match patient outcomes across time points. I believe you could take the patient-level outcome at the final time point as your outcome variable and then include the clinic-level proportion at a previous time point (baseline, say) as a covariate in the model. Not ideal, but defensible. 

    I'm not sure about Beta regression on the clinic-level proportions unless there's a way to incorporate the precision of the proportions (reflecting sample size in each clinic) into the model. 
    Finally, I'm not an expert on the topic, but epidemiologists sometimes use cluster-level methods for this kind of analysis (e.g., see Donnar & Klar, American Journal of Epidemiology 1994; 140:279–289).
    I'll be curious to hear what ideas others have.
    Cheers,
    Vince



    ------------------------------
    Vincent Staggs, PhD
    Director, Biostatistics & Epidemiology Core, Children's Mercy Research Institute;
    Professor, School of Medicine, University of Missouri-Kansas City
    ------------------------------



  • 3.  RE: multilevel trial analysis plan second opinion

    Posted 05-25-2023 21:27

    Your problem, to be sure   I understand, there are summary statistics at the clinical level but no individual patient data by clinic. 
    That sort of problem is not uncommon in economics/econometrics and is termed the 'aggregation problem". 
    This paper by van Dijk, and Papp, both econometricians, exemplifies that type of data and provides a bayesian method for analysis.
    https://hal.science/hal-00520644/document
    this paper is an example of aggregation and not necessarily the paper useful to you. 
    You may have better luck finding more useful papers on the methods.

    excerpting the abstract.

    Empirical analysis of individual response behavior is sometimes limited due to
    the lack of explanatory variables at the individual level. In this paper we put forward
    a new approach to estimate the effects of covariates on individual response, where
    the covariates are unknown at the individual level but observed at some aggregated
    level. This situation may, for example, occur when the response variable is available
    at the household level but covariates only at the zip-code level.
    We describe the missing individual covariates by a latent variable model which
    matches the sample information at the aggregate level. Parameter estimates can
    be obtained using maximum likelihood or a Bayesian analysis. We illustrate the
    approach estimating the effects of household characteristics on donating behavior
    to a Dutch charity. Donating behavior is observed at the household level, while the
    covariates are only observed at the zip-code level.



    ------------------------------
    Chris Barker, Ph.D.
    2023 Chair Statistical Consulting Section
    Consultant and
    Adjunct Associate Professor of Biostatistics
    www.barkerstats.com


    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    ------------------------------



  • 4.  RE: multilevel trial analysis plan second opinion

    Posted 05-25-2023 22:53

    Hi - 

    I am curious about the specification of the hypothesis.  With several time points and individual outcomes of 0 or 1, it looks like there is a stepped intervention that somehow reaches a stream of individuals within each treatment organization over time regarding individual decisions to reject or accept the medically indicated treatment.  What is the intervention that is supposed to stimulate the acceptance response and so raise participation? How is it structured?  Is the force of the stimulus to be varied over time?  What is the theory of why it would be effective?  How do you prevent the stimulus you apply to individuals being streamed through the first four organizations from also influencing individuals who are simultaneously being streamed through the second four organizations?  The original design is very nice.  Maybe some of the organizations can be worked with, separately by organization, to structure their intake forms so patients can opt in.  It will be useful to even get one or two of the organizations to participate with the original design even if some do not.  Possibly there is a way to work out cooperation with the level of administration above the individual organizations to sponsor the research.  Generally, working with a funding agency, such as a large city health administration with clinics, such information is considered internal and essential to assess the effectiveness and efficiency of government operations, so access to data within government is frictionless for evaluation research.  The objection from the community organizations does not make sense, though they may have the power to prevent access to the data.  Keeping patient identifiers secret is a separate question from access to patient data for evaluation of medical services.  I know this is not very helpful, but if the information you are developing is of value to the provision of medical services, then you need access to the data required to carry out the design.  A dumbed down version of the research may or may not be worth the effort. If you have control over the stimulus and can vary its intensity over time and between organizations, the trick you describe with backing out information using the fact that '0-1' data are proportions may work well enough.  Good luck with the project!



    ------------------------------
    Hugh Peach
    President
    H. Gil Peach & Associates, LLC
    ------------------------------