ASA Connect

 View Only
Expand all | Collapse all

Survival analysis or negative binomial regression?

  • 1.  Survival analysis or negative binomial regression?

    Posted 06-08-2015 16:28

    I would be grateful for your input on the following study:

    Research assistants posing as patients called clinics to schedule an appointment with a specialist.  The outcome is the number of days until the appointment (median = 40 days, range 3-208 days).   There are several exposures of interest (some are categorical, some are continuous).  For example, we want to test whether number of days until the appointment is significantly greater for group A than group B after adjusting for confounders. 

    About 15% of clinics would not schedule an appointment (10% were not accepting new patients, 5% had a wait list).  We assume that patients would have to wait an excessively long time for an appointment at these clinics, but the number of days they would have to wait is unknown. 

    I am considering two analytic approaches:

    Approach 1:  include all clinics, treat the 15% of clinics that could not schedule an appointment to be right censored at 210 days (2 days beyond the maximum value observed in the data), and use survival analysis (cox PH model) to analyze the data.

    Approach 2:  include only those clinics where the number of days until the appointment is known, and model the data using negative binomial regression (there is overdispersion).

    Which approach do you recommend?  Do you have other suggestions about how to analyze this data?

    Thank you in advance for your help.

    ------------------------------
    Amy Storfer-Isser
    Statistical Research Consultants, LLC
    ------------------------------



  • 2.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 07:19

    My thought is that you don't want to use survival analysis because it doesn't model the situation well. It's not that each requester is at some risk of having an appointment happen at any given time. Rather, the time-to-appointment is fixed at the time the requester contacts the clinic, with that time being your outcome of interest.

    I'm interested in what others have to say; I'd say, first, you want to be sure that not having appointments available truly is independent of requester characteristics (that is, the clinic doesn't pretend to not have appointments for requesters they don't like, or squeeze in those they like). Given that, you can only analyze data from the clinics that in fact could give appointments -- this is your study population, other clinics can't meet your inclusion criteria. Finally, if you want to generalize to more clinics -- including those with wait lists or not accepting new patients -- you have to examine whether this is appropriate. Do full-up clinics tend to serve different populations, offer different types or quality of care, be more popular for various reasons, and so on. And when you generalize, mention how these differences might affect your predictions.






  • 3.  RE: Survival analysis or negative binomial regression?

    Posted 06-10-2015 11:10
    I am absolutely uncomfortable with not using all of the data (Approach 2).

    I wish people would stop using the expression "survival analysis" when doing time-to-event analysis when the event is not death (on something reasonably comparable).  Not only does the vocabulary seem to me to be wrong in itself, but it seems to confuse non-statisticians.  I have done a fair amount of time-to-event analysis with right censoring, but not survival analysis per se.

    I would like to see the data plotted before committing to an analysis unless there was a previous pilot study in which case I would like to see a thorough analysis of the pilot study before writing the analysis plan for the current study (which plan should ideally be written before the study started).  If there was no previous pilot study, then this study is the pilot study and is open to all sorts of exploratory data analysis.

    I would like to know what past research has suggested should be included as covariates.





  • 4.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 07:46

    Before using option 1 (right censor), I would apply logistic regression (same predictor set) to test for differences between clinics at which an appointment was made and those at which no appointment could be made. (There might be something "special" about clinics with no patient openings, something that makes them different from clinics with long wait times.). If there are no differences, then go ahead and follow option 1. If there are differences, then follow option 1 using only clinics at which an appointment could be made (no right censorship--remove "outliers") with an additional report of the appointment v. no-appointment differences.

    ------------------------------
    Francis Dane
    Chair
    Jefferson College of Health Sciences
    ------------------------------




  • 5.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 08:21

    I agree with Eric Cohen that Approach 1 is not appropriate for your data.

    Another approach would use a 3-part model, one part for each of the three outcomes: not accepting new patients, wait list, and days until the appointment. This approach has some kinship with hurdle models, which have two parts, a (binomial) logistic regression for whether the "hurdle" is crossed and a negative-binomial model (for example) for the number of days.  Your data would need a multinomial logistic regression for the first two parts.  I have not looked for this particular model in the literature, and it may not be already implemented in software.  You could combine the first two parts, especially if the characteristics are similar between practices that are not accepting new patients and practices that have a wait list.  But, having written this, it seems to me that your situation is less complicated.

    Hurdle models and related zero-inflated models provide a way of handling sources of zero counts that do not fit into the framework of the simpler models.  Since the range of your data is 3 to 208 days, you do not have any observations of zero days.  Indeed, such observations seem unlikely, and you may want to exclude that possibility.  Thus, you should be able to do two separate analyses: a multinomial (with three categories, perhaps regarded as ordered, for the three parts) and a Poisson or, more likely, negative binomial for the number of days until the appointment.


    ------------------------------
    David Hoaglin
    ------------------------------




  • 6.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 09:58

    It is a very interesting question. In general the Cox model should be more powerful (give narrower confidence intervals) since it takes into account the time to event in addition to the number of events. I think you would need to assume that the 85% of clinics that actually provided an appointment time would also have been censored had the next available time slot for the specialist exceeded 210 days. That would make the censoring administrative and hence non-informative.

    ------------------------------
    Cyrus Mehta
    Cytel, Inc.
    ------------------------------




  • 7.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 10:02

    Good question. I'd go with survival analysis, partly because I'm more familiar with it, but it is time to event data, and not repeated trials. If you called back repeatedly, you might catch cancellations or such, and get different results. There are lots of good diagnostic tools for Cox regression ( eg cox.zph() in R) and if the model fits poorly there are many alternative survival models. 

    I'm uncomfortable with the "no appointments" though.  There are a lot of them. Perhaps you could have asked how long people stay on the wait list, typically, or when it might be ok to call again.  At least, do a sensitivity analysis, censoring from 1 day up to a couple of years. 

    ------------------------------
    Peter Wollan
    Olmsted Medical Center
    ------------------------------




  • 8.  RE: Survival analysis or negative binomial regression?

    Posted 06-09-2015 13:31

    I do not see this as drawing patients. With each call you are sampling practices.

    Each practice is a cluster, with a distribution of appointment times. Each time you call a particular practice you are sampling from that distribution.

    The means (medians) of all these practice-specific distributions have a distribution themselves and it is the hyperparameter of this distribution that you are interested in (in the Bayesian sense).

    I would go with survival analyses with random effects (frailty).

    If you have multiple calls per practice, great. But if you only have one call per practice, then you only have 1 draw from each cluster and this reduces the analyses to the usual survival analyses modeling.

    I would NOT include practices that do not take new patients. They have no distribution of time-to-appointment (for new patients) so to speak. Conceptually, they are not even eligible when we estimate time to appointment for new patients. Practices w/ wait list are unclear, but I view them similarly to those that do not take new patients.

    But that is only if you are interested in time to appointment for NEW patients. If you are interested in time to appointment of any patient (including established patients), things are different. But your design is probably not quite right for that objective.

    Finally, I strongly favor survival analyses over negative binomial. In practice, In the past, I have had trouble fitting negative binomial models that fit the data well (I have used Stata which has a fairly flexible implementation of those models). Also I can assess the fit better w/ survival models and have a better handle on what to do to improve the fit.

    Disclaimer: I am neither a survival expert nor a Bayesian, so I may be off in a lot of these things.


    ------------------------------
    Constantine Daskalakis
    Thomas Jefferson University, Philadelphia, PA
    ------------------------------




  • 9.  RE: Survival analysis or negative binomial regression?

    Posted 06-10-2015 07:21

    This discussion (or I, at least) would benefit from a careful explanation of why the data are more appropriately analyzed as time-to-event data.

    An important advantage of methods of survival analysis is that they can handle censored data. In this example, I do not see an observation on a specialist who is not accepting new patients as censored. Instead, one could regard the number of days for that observation as infinite.  I am reminded of the New Yorker cartoon that shows a man on the phone with someone who is trying to make an appointment; the caption is "How about never? Is never good for you?"  Treating such observations as infinite has worked well in some analyses that transformed the data to a reciprocal scale, replacing time with "speed."  Paul Velleman has an impressive example in which the observation is the time required to complete a task. Some children did not complete the task and hence had zero "speed." In principle, it might have been possible to treat their observations as censored at the time allotted for them to complete the task, but transforming to the reciprocal scale produced an effective analysis for the relation of the outcome to the predictors in those data.  The present example seems not to have an analogous censoring time for the specialists who are not accepting new patients.

    If methods for handling censoring are not needed, the analysis of the number of days until the appointment takes the form of a regression.  Then a key part of the task is to find a suitable model, including the random component.  If a negative-binomial distribution is not satisfactory, other choices are available, including "continuous" distributions (some of which are used in survival analysis).


    ------------------------------
    David Hoaglin
    ------------------------------



  • 10.  RE: Survival analysis or negative binomial regression?

    Posted 06-10-2015 07:56

    Zero-inflated negative binomial might give you the estimates you need.


    ------------------------------
    Brent Blumenstein
    ------------------------------




  • 11.  RE: Survival analysis or negative binomial regression?

    Posted 06-11-2015 09:46

    David,

    I'm not saying that the data are more appropriately analyzed by survival.

    i am saying that survival methods are a bit more flexible than negative binomial.

    As you say, the trick is finding a reasonable model. If negative binomial doesn't quite do, then you search for another regression model that is appropriate. And if you find something good you are set.

    But why not use survival methods? Yes, if you wanted to do parametric models you would still face the same problem. But we have non-parametric (Kaplan-Meier) and semi-parametric choices (Cox) that are more general and flexible. For sure, we have that proportionality assumption in Cox, but we do not have to worry about distributional assumptions. That's what I'm saying.

    There are other advantages as well (minor perhaps?) over negative binomial regression - familiarity, interpretation, presentation.

    With all that, I suspect that reasonable analyses with either method would give you qualitatively similar results.


    ------------------------------
    Constantine Daskalakis
    Thomas Jefferson University, Philadelphia, PA
    ------------------------------




  • 12.  RE: Survival analysis or negative binomial regression?

    Posted 06-10-2015 08:47

    I suggest that you investigate models from queuing theory.

    i am not particularly well versed in these, but it seems that the assumptions of some of these models apply to your research assumptions.

    ------------------------------
    Gretchen Donahue
    ------------------------------




  • 13.  RE: Survival analysis or negative binomial regression?

    Posted 06-12-2015 12:17

    First, you should be clear what is the population for your study.  If the population is all the practice sites, then you should use all in your sample.  Since some practice sites do not accept new patients, then a cured survivor model needs to be fitted.   Otherwise, you can analyze only the sites that accept new patients.

     

    As pointed by others, within a site the observations may be correlated and hence appropriate approach (such as frailty model or GEE) should be applied.

     

    Negative binomial is often used for discrete survival analysis as a parametric approach.  It can also handle censoring.   If your data can fit such a model, you can use it.






  • 14.  RE: Survival analysis or negative binomial regression?

    Posted 06-10-2015 09:45

    I don't mean to complicate what is already an interesting discussion, but the responses presume some degree of independence among the observations.  To the extent that the clinics represent a finite set of service providers within a defined geographic and/or service-type catchment area, then I expect time-to-appointment may depend on the overall wait times in the area.  I don't know how to model this, but you might be interested to see if such a relationship exists in the data.  For example, if you had address or geographic coordinates for the clinic locations sampled, you might be able to identify cluster wait times and then look at variation within the clusters.  Also, since queue length may vary over time, you may also want to consider when the surveys were conducted unless they were all completed over a relatively short data collection window.

    ------------------------------
    Edward De Vos
    Assoc VP for Research
    William James College
    ------------------------------




  • 15.  RE: Survival analysis or negative binomial regression?

    Posted 06-14-2015 18:50

    I would group by # days, then compare and contrast high's and lows; and perform probability analysis, then if there are highly correlated factors then build a model else considering scoring models. 

    Al