Discussion: View Thread

  • 1.  Survival analysis or negative binomial regression?

    Posted 06-08-2015 16:37

    I would be grateful for your input on the analytic approach for this study:

    Research assistants posing as patients called clinics to schedule an appointment with a specialist.  The outcome is the number of days until the appointment (median = 40 days, range 3-208 days).   There are several exposures of interest (some are categorical, some are continuous).  For example, we want to test whether number of days until the appointment is significantly greater for group A than group B after adjusting for confounders. 

    About 15% of clinics would not schedule an appointment (10% were not accepting new patients, 5% had a wait list).  We assume that patients would have to wait an excessively long time for an appointment at these clinics, but the number of days they would have to wait is unknown. 

    I am considering two analytic approaches:

    Approach 1:  include all clinics, treat the 15% of clinics that could not schedule an appointment to be right censored at 210 days (2 days beyond the maximum value observed in the data), and use survival analysis (cox PH model) to analyze the data.

    Approach 2:  include only those clinics where the number of days until the appointment is known, and model the data using negative binomial regression (there is overdispersion).

    Which approach do you recommend?  Do you have other suggestions about how to analyze this data?

    Thank you in advance for your help.

    ------------------------------
    Amy Storfer-Isser
    Statistical Research Consultants, LLC
    ------------------------------



  • 2.  RE: Survival analysis or negative binomial regression?

    Posted 06-08-2015 17:05

    Assuming that clinics not currently accepting patients will eventually accept new patients but are too busy now, then the fact that they are not currently accepting new patients provides information about the distribution of time to appointment, so survival analysis is more appropriate. Clinics KNOWN to be not ever accepting new patients such as planned closure due to retirement can be excluded. Analysis of time until appointment in clinics accepting new patients would be like analyzing time to death among only patients who die and ignoring censoring in a survival study.

    David

    ------------------------------
    David Bristol
    Statistical Consulting Services
    ------------------------------




  • 3.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 09:45

    Your data involves of a mixture of distributions: part of the population of clinics are not open to appointment (a discrete distribution); the other part are willing to schedule appointment (a continuous distribution).  This is somewhat similar to the situation in zero-inflated(deflated) Poisson modeling.

    Also, the recorded days to appointment are continuous time-to-event data (without censoring), instead of count data. Start with the general gamma distribution in SAS PROC LIFEREG and see if the distribution can be simplified into gamma or Weibull.      

    ------------------------------
    Qing Kang
    Chief Scientist
    Statistical Intelligence Group, LLC
    ------------------------------




  • 4.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 10:04
    My first thought is that you would want to limit your analysis to those practices that are accepting new patients.  Rather than assume a negative binomial distribution, my inclination would be to transform the data to mitigate an anticipated right skew - e.g., a log or square root transformation.
     
    Relative to practices being closed to new patients, if there is a concern that Group A could find practices closed at a different rate than Group B, I would examine that probability in a separate analysis.
     
    Best regards,
    Michael Morton


    ------Original Message------

    I would be grateful for your input on the analytic approach for this study:

    Research assistants posing as patients called clinics to schedule an appointment with a specialist.  The outcome is the number of days until the appointment (median = 40 days, range 3-208 days).   There are several exposures of interest (some are categorical, some are continuous).  For example, we want to test whether number of days until the appointment is significantly greater for group A than group B after adjusting for confounders. 

    About 15% of clinics would not schedule an appointment (10% were not accepting new patients, 5% had a wait list).  We assume that patients would have to wait an excessively long time for an appointment at these clinics, but the number of days they would have to wait is unknown. 

    I am considering two analytic approaches:

    Approach 1:  include all clinics, treat the 15% of clinics that could not schedule an appointment to be right censored at 210 days (2 days beyond the maximum value observed in the data), and use survival analysis (cox PH model) to analyze the data.

    Approach 2:  include only those clinics where the number of days until the appointment is known, and model the data using negative binomial regression (there is overdispersion).

    Which approach do you recommend?  Do you have other suggestions about how to analyze this data?

    Thank you in advance for your help.

    ------------------------------
    Amy Storfer-Isser
    Statistical Research Consultants, LLC
    ------------------------------



  • 5.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 11:26
    Dear Amy,

    You don't have survival times, you have waiting times, all observed more or less contemporaneously. This distinction is central in choosing a method. Regression methods useful for survival times might be useful, but only as a convenience, not because the measurements require survival methods (see below). Similarly, you don't have count data, so regression methods (Poisson, negative binomial) for count data might be useful, but again, only as a convenient kind of regression.
     
    You didn't say anything about the source of the list of clinics. If these are unrelated to each other, then information from the clinics who are willing to make an appointment (waiting times) doesn't seem to provide any information about the clinics who didn't make an appointment. In other words, treating them as censored in a survival-like model seems uninformative and inappropriate.

    If these clinics are from a single organization with all or most policies the same, then the clinics that made appointments provide some information about the rest of the clinics in their organization. Examples of organizations might be veterans affairs (VAMCs) or a large provider such as Kaiser-Permanente. Also, in the UK there are national policies on waiting times for an appointment, so all NHS providers could be included, but these statistics are tracked by the NHS and violations of policies are often in the news.

    There are few such organizations that seem large enough to qualify, so I think that treating the clinics that refused to make an appointment as censored would not be informative and would be confounding. By this, I mean that they can influence the parameter estimates in any model for waiting times though the actual causes of refusal are different.

    I conclude that it would be more informative to treat the refusal to make an appointment as a separate issue from the waiting time. That is, model the willingness to make an appointment using one method (for binomial data) and the waiting time for scheduled appointments with another method (for more or less continuous data).

    There is no survival as such and there isn't the usual idea of a hazard here (all the wait times are observed more or less simultaneously) so other regression methods may be useful. I am thinking of a transformation of the wait times or a different distributional link function. In that vein, a transformation used in survival, such as the weibull, might work quite well, but with no interpretation of the waiting times as survival times. Some regression methods for count data might also be informative, but with no such interpretation. In view of this, nonparametric proportional hazards methods appear to be inappropriate.
     
    Since appointments are usually filled for at least the next few days, except for some time that might be reserved for emergencies, then short waiting times might actually be lower or reduced, that is, deflated. Short wait times might not be informative at all about you main questions of interest, since they might be haphazard.
     
    There is also a question of seasonality in wait times, as any parent who tried to make an appointment with an orthodontist knows. Many doctors are much busier during school vacations. This raises the question of how long it took to collect the data and whether seasonality might be important in the analysis.

    Graphical methods should not be neglected. They might provide useful information for the selection of a regression model for waiting times.

    Can you divulge the number of clinics, calls per clinic, and how long the data collection took?
     
    Regards,

    David Smith


  • 6.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 18:44

    Thank you David B, Qing, Michael and David S for your insightful comments.

    David S - to answer your queries - there are about 150 clinics and they are independent.  Examining differences by season is one of our primary objectives.

    Most of the feedback I received indicates that "able to make an appointment" should be the outcome of one analysis (e.g., logistic regression model), and "wait time for an appointment" should be a separate analysis.  There is a lack of consensus about the type of model that is appropriate for the wait time analysis... given the target audience, it would be helpful if I can keep it simple, and ideally be able to interpret the results as, "median wait time for an appointment was XX days longer in the fall than winter".  With that in mind, I welcome additional feedback about model selection.

    Thanks again,

    Amy

    ------------------------------

    Amy Storfer-Isser
    Statistical Research Consultants, LLC
    ------------------------------




  • 7.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 18:57

    Hello Amy—

    My 2cents: It depends what you want to learn. Do you want to learn about clinics accepting and clinics not accepting new patients, or do you want to learn about the length of time a  person waits between the call and the scheduled appointment?    

    If it is the latter, then you should have screened out those clinics not accepting new patients before making calls if randomization was involved in assigning callers to clinics.  At this point, it becomes a missing data problem and no censoring applies to the outcome.   

    If it is the former, then I agree with Qing Kang that you have a mixture of clinics and should use a two-part model.  Your outcome is a waiting time, not a survival time.

    Thanks for your query—

    Marilyn

    Marilyn Stolar

    Associate Research Scientist

    Yale Center for Analytical Sciences

    Yale School of Public Health, Biostatistics Dept


    ------------------------------
    Marilyn Stolar
    ------------------------------