ASA Connect

 View Only
  • 1.  Power analysis for census tract level analysis

    Posted 01-19-2016 16:41

    I am designing a study where we will be using registry data to identify prevalent cases with a disease, and we will be able to identify the census tract in which they reside. The goal of the study is to estimate the variation in prevalence rates across census tracts in New York City. I was planning on doing this using a generalized linear random effects model. Even though the census tracts are fixed, I was planning on using a random effects model because we have over 2,100 census tracts in NYC, far too many groups to use fixed effects. Since this is part of a grant application, some sort of power analysis is required, and this is where I get tripped up. We are not sampling - we have all of the data for a particular year: # of cases and total population. The notion of power requires some idea of random sampling, so it does not seem to apply here. My plan was to conduct a power analysis using simulation with different scenarios of random effect variation - but it is not obvious what should vary from iteration to iteration - the census tract level population? the census tract disease prevalence? the census tract random effect? And what would it really mean if these quantities vary from iteration to iteration? I'm not sampling different census tracts. If anyone can make any suggestions, I would be much obliged.

    ------------------------------
    Keith Goldfeld
    NYU School of Medicine
    ------------------------------


  • 2.  RE: Power analysis for census tract level analysis

    Posted 01-20-2016 08:11

    Dear Keith:

    In small area analysis of disease incidence or prevalence, we typically treat the assignment of cases to individuals as the random process (i.e., cases are randomly selected from the at risk population).  In this setting the census population in each tract remains fixed and the cases are randomly assigned to individuals (or, equivalently, to tracts proportional to the population size of each tract).  This is an "equal risk" null model of disease occurrence.  Some choices to make include:  Do you simulate the same number of cases each time or do you simulate according to the overall risk and allow the total number of cases to vary year to year.  Since you are interested in prevalence, you will need to decide how many incident cases to add to the group each year and how many cases to remove due to death or cure.  For power, it depends what your alternative is?  Are you looking for deviations from an equal risk hypothesis (every person has the same risk)?  If so, simulating patterns from the alternative (e.g., higher risk near an industrial site) can be done by assigning the appropriate case probabilities as a function of exposure and associated relative risk.  Carol Gotway and I have a book on "Applied Spatial Statistics for Public Health Data" where we give examples of this including hypothesis tests for geographic clustering and spatial random effects models.

    ------------------------------
    Lance Waller
    Professor
    Emory University



  • 3.  RE: Power analysis for census tract level analysis

    Posted 01-20-2016 10:09

    Lance -

    Thanks for your note. I am familiar with your book - and have been meaning to read through it. If we get the grant, I will definitely make sure to do that!

    I left out a key piece of information in describing the problem. We are looking at deaths from a particular disease and are interested if they received treatment A or treatment B prior to death. We know that in different parts of the city more folks get treatment A than B. Our goal is to identify clusters where treatment A predominates so that we can compare them to clusters where treatment B predominates. Ideally, these clusters would be as similar as possible except for their outcomes with respect to A and B. The null hypothesis is that there is not variation across the spatial landscape (after adjusting for covariates). The alternative is that there is variation. We will have data over time, but we believe that patterns of A and B do not change over time. Certainly the number of deaths change in particular areas as do particular treatments - but the underlying intensities do not.

    - Keith

    ------------------------------
    Keith Goldfeld
    NYU School of Medicine



  • 4.  RE: Power analysis for census tract level analysis

    Posted 01-21-2016 11:02
    Hi Keith:

    That's an interesting question. If you have the proportion receiving A and proportion receiving B in each tract, you can treat either as a covariate in your generalized linear mixed model. Or if you have assignments of each tract into the A group and B group you could use these indicators as covariates.

    I'd still advocate Monte Carlo power analyses since you could generate the patterns/differences you would like power to detect. Running the GLMMs would take some time, but I like the control one has over the power questions in a simulation setting over various approximations. If pressed for time, you could review options for GLMM power analyses (although these may not adjust for residual spatial correlation...).

    Best wishes!

    Lance

    ________________________________

    This e-mail message (including any attachments) is for the sole use of
    the intended recipient(s) and may contain confidential and privileged
    information. If the reader of this message is not the intended
    recipient, you are hereby notified that any dissemination, distribution
    or copying of this message (including any attachments) is strictly
    prohibited.

    If you have received this message in error, please contact
    the sender by reply e-mail message and destroy all copies of the
    original message (including attachments).




  • 5.  RE: Power analysis for census tract level analysis

    Posted 01-20-2016 09:14

    Keith, this is an interesting problem.

    All I have for now are some questions that came to mind after reading your post.

    Why would variation in disease prevalence at the census tract level be interesting to uncover from a clinical perspective? Is that type of variation something that can be factored in when treating the disease? How else would knowledge of that variation affect clinical or epidemiological practice? 

    Is the disease itself common or rare? 

    Can you clarify if you'll have access to multiple years worth of data and also if you are planning to include any other explanatory variables in your model? 

    The number of years you will consider in your study may determine whether you can treat the underlying population as stable from year to year.  A small number of years may justify assuming a fairly stable population. 

    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.
    isabella@ghement.ca



  • 6.  RE: Power analysis for census tract level analysis

    Posted 01-20-2016 10:17

    Isabella -

    Thanks for your note. I provided a little more detail in response to Lance - but can repeat some of it here.

    Why would variation in disease prevalence at the census tract level be interesting to uncover from a clinical perspective? Is that type of variation something that can be factored in when treating the disease? How else would knowledge of that variation affect clinical or epidemiological practice? 

    We are looking at deaths from a particular disease and are interested if they received treatment A or treatment B prior to death. We know that in different parts of the city more folks get treatment A than B. Our goal is to identify clusters where treatment A predominates so that we can compare them to clusters where treatment B predominates. Ideally, these clusters would be as similar as possible except for their outcomes with respect to A and B.

    Is the disease itself common or rare? 

    Deaths are pretty rare - less than 1% of the population. But NYC has a large population, so there are a considerable number of cases citywide. And we are interested really not so much in the number of deaths, but in comparing the numbers of those with treatment A vs B, where both treatments are not rare.

    Can you clarify if you'll have access to multiple years worth of data and also if you are planning to include any other explanatory variables in your model?

    Yes, we will have multiple years - maybe as many as 10. Even though 10 is a lot, we believe that the patterns of A vs B are fairly stable, though of course we will be finding that out.

    - Keith

    ------------------------------
    Keith Goldfeld
    NYU School of Medicine



  • 7.  RE: Power analysis for census tract level analysis

    Posted 01-22-2016 01:05

    Thanks for your detailed answer, Keith!  Setting aside your power question for the time being, here is what I am thinking. 

    Patients in your study get either treatment A or treatment B for their disease. Not knowing anything about the disease you are studying, I wonder how quickly after they receive treatment these patients die. I also wonder whether, before they die, these patients can receive a single treatment once (e.g., A), a single treatment repeatedly over time (e.g., A, A, A) or various sequences of combined treatments A and B over time (e.g., A, B, B, A, B, etc.). 

    In any event, it seems to me that receiving a treatment (sequence of treatments) only tells half of the story.  The other half of the story has to do with whether the treatment (sequence of treatments) had the capacity to prolong these patients' lives from the first time it was administered.  If 90% of the patients in a particular census tract received treatment A once during the study (say) but all of these patients died within a month from receiving this treatment - is that enough information for someone to be able to make clinical decisions? However, if the remaining 10% of the people in that census tract -  who received treatment B - died within 5 years from treatment initiation, then one would have to wonder if future patients (with similar covariates) should be switched to treatment B? 

    Perhaps there is some geographic factor that differentiates the two treatments (e.g., treatment A is less expensive and is prescribed to those in poorer regions of New York). But if no information is available on how the treatments help prolong life, it might be hard to compare percentages of people who (last) received treatment A or treatment B. 

    Lance is much better qualified than I am to provide guidance on the power calculations. But I think the study findings should be amenable to clinical decision-making - unless perhaps I don't understand other aspects of this study.

    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.