ASA Connect

 View Only
  • 1.  Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-18-2023 15:41

    Dear ASA:

    Does anyone have a reference (text book or article) that describes how to calculate the power and sample size for:
    Binary data (Outcome is rate of a particular surgical procedure)
    Longitudinal over 6 years in increments of year
    Clustered by Hospital, approximately 1500 hospitals

    SAS preferred, although R or Stata also fine.



    ------------------------------
    Brandy Sinco, BS, MA, MS
    Statistician Senior
    Michigan Medicine
    ------------------------------


  • 2.  RE: Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-19-2023 07:20

    The following resources contain dedicated chapters on power and sample size calculations for non-Gaussian data, including binary responses. I find both to be very useful resources. Good luck!

    Generalized Linear Mixed Models: Modern Concepts, Methods and Applications

    Routledge & CRC Press remove preview
    Generalized Linear Mixed Models: Modern Concepts, Methods and Applications
    Generalized Linear Mixed Models: Modern Concepts, Methods and Applications presents an introduction to linear modeling using the generalized linear mixed model (GLMM) as an overarching conceptual framework. For readers new to linear models, the book helps them see the big picture.
    View this on Routledge & CRC Press >

    Analysis of Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences

    Wiley.com remove preview
    Analysis of Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences
    Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences provides readers with an understanding and appreciation for the design and analysis of mixed models for non-normally distributed data. It is the only publication of its kind directed specifically toward the agricultural and natural resources sciences audience.
    View this on Wiley.com >







    ------------------------------
    Nora Bello
    Professor
    The Ohio State University
    ------------------------------



  • 3.  RE: Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-19-2023 08:43

    Nora Bello's suggestion to consult Walt Stroup's book is excellent. I believe I read this material in it came out came out in 2012, but, surprisingly, I don't have a copy to check my 74-YO memory. (Sorry, Walt, my also o-l-d friend, including a couple of rounds of golf over the years.)

    I am, however, 98.6732% sure that he has long advocated using a basic "trick" I proposed >40 years ago called the "exemplary data method." In brief, you create an artificial data set that models your scenario for how Mother Nature and Lady Luck will conspire to produce the data for your proposed study. You then (1) feed these data to your favorite modeling software to produce the likelihood-ratio statistics of interest, then (2) turn those into non-centrality parameter values, and then (3) compute the power probabilities.

    Letting the exemplary dataset have N observations give a non-centrality value of lambda(N), a study with cN observations has a non-centrality value of c*lambda(N). One should also do sensitivity analyses by changing the exemplary data in various ways to see how this affects the powers of interest. What all this does is force you to get very involved in the study design and to tailor your statistical plan to the specific research questions.

    The same approach is used in Section 6.5 of Agresti's book, Categorical Analysis, 2nd Edition. The example in 6.5.5 is quite simple, so it would be a good place to start--you can check his

    computations with yours to make sure you "get it." Those who want to dive more deeply might check out

    Shieh, G. (2000). On power and sample size calculations for likelihood ratio tests in generalized linear models. Biometrics, 56(4):1192–6.

    Actually, I now advocate using study-specific Monte Carlo simulations to examine how a few tailored confidence intervals behave under reasonable exemplary dataset scenarios, but that's a topic for another day, week, month, year, or even a career.



    ------------------------------
    Ralph O'Brien
    Professor of Biostatistics (officially retired; still keenly active)
    Case Western Reserve University
    http://rfuncs.weebly.com/about-ralph-obrien.html
    ------------------------------



  • 4.  RE: Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-19-2023 11:23
    Edited by Elizabeth Claassen 05-19-2023 11:24

    "Generalized Linear Mixed Models: Modern Concepts, Methods and Applications" by Walter Stroup has a whole chapter dedicated to power and sample size calculations in the GL(M)M context. Built-in sample size calculators will invariably underestimate the sample size required, so working through the steps provided (in SAS) in the book is recommended. It's pretty straight forward, and has been shown with simulation studies to be pretty accurate.

    I see now that Nora and Ralph have already put forth this suggestion! Nice to see you both! :)



    ------------------------------
    Elizabeth Claassen
    ------------------------------



  • 5.  RE: Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-19-2023 11:23

    I'll second (third?) Stroup's book on Generalized Linear Mixed Models.  Also, I recommend Olvera Astivia, Gadermann, and Guhn (2019).  I don't think it handles the repeated measures bit, but you might find it helpful (and it links to a shiny app that works pretty nicely). 
    https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-019-0742-8



    ------------------------------
    Ben Fitzpatrick
    Loyola Marymount University
    ------------------------------



  • 6.  RE: Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-22-2023 10:39

    Dear Brandy and SAS Users,

    If your study design is a multi-period cluster intervention trial -- cohort or repeated cross-sectional, including parallel arms, cluster-crossover, or stepped wedge--, or otherwise you have a binary cluster-level exposure variable of interest -- either time-varying or constant over time -- and are treating time as a categorical variable, the SAS macro CRTFASTGEEPWR for power based on generalized estimating equations can be used for binary, count or continuous outcomes. Key references are:

    Zhang Y, Preisser JS, Li F, Turner EL, Rathouz PJ. %CRTFASTGEEPWR: A SAS macro for power of generalized estimating equations analysis of multi-period cluster randomized trials with application to stepped wedge designs. arXiv e-prints 2022: arXiv:2205.14532.

    Zhang Y, Preisser JS, Turner EL, Rathouz PJ, Toles M, Li F. A general method for calculating power for GEE analysis of complete and incomplete stepped wedge cluster randomized trials. Stat Meth Med Res 2023. 32(1), 71-87.



    ------------------------------
    John Preisser
    University of North Carolina
    ------------------------------



  • 7.  RE: Power for Generalized Linear Model, Binary Outcome, Longitudinal Data, Clustered by Hospital

    Posted 05-22-2023 13:31

    My bottom line in 2023 is that complex statistical planning is best done via Monte Carlo studies that capture the particular features of the study being designed. My favorite example was from maybe 30 years ago. When serving on an NIH study section, I reviewed a grant application for a smoking prevention study that proposed to randomize over 40 "independent" localities to two groups and study hundreds of high school kids within each locality. (No way are they independent!) Such studies have huge price tags. The statistical planning justifying potential bang for the research buck was grounded in a comprehensive Monte Carlo study that was detailed in a 50-page appendix.

    If your eventual data analysis will focus on key, tailored confidence intervals, then so can the Monte Carlo study. If it will be Bayesian, then you might focus on how key posterior distributions will vary based on your choice of priors. Of course, everything depends on how you model the different scenarios for how Mother Nature and Lady Luck will give you the actual data. Some will argue that this is all BS--and, unfortunately, way too many "power analyses" are--but going through a genuine statistical planning process helps focus the research questions, improve the research design, and tailor the analysis methods. This supports the Scientific Method and helps the team secure adequate funding.



    ------------------------------
    Ralph O'Brien
    Professor of Biostatistics (officially retired; still keenly active)
    Case Western Reserve University
    http://rfuncs.weebly.com/about-ralph-obrien.html
    ------------------------------