ASA Connect

 View Only
Expand all | Collapse all

Missing value imputation in a competing risks setting

  • 1.  Missing value imputation in a competing risks setting

    Posted 01-14-2019 20:36
    Edited by Isabella Ghement 01-16-2019 15:21
    Hi everyone,

    Currently, ​I am working on a study where I need to compute the cumulative incidence of an event of interest (Event_1) in the presence of a competing event (Event_2). 

    For each of these events, I have an indicator variable and a time to event variable. In particular:

        Indicator_1 = 1 if the subject experienced Event 1 by the end of the study;
                          = 0 if the subject did NOT experience Event 1 by the end of the study. 

        Indicator_2 = 1 if the subject experienced Event 2 by the end of the study;
                           = 0 if the subject did NOT experience Event 2 by the end of the study.

    Furthermore,

        Time_1 = time to Event 1 (if the subject experienced this event) or to the
                 end of study (if the subject did NOT experience this event);

        Time_2 = time to Event 2 (if the subject experienced this event) or to the
                 end of study (if the subject did NOT experience this event).

    Now, the problem is that for a number of subjects (about 10% of all subjects), the test which was supposed to determine whether or not they experienced Event_1 was not administered (for whatever reasons, which are not known).  So these subjects have missing values for both Indicator_1 and Time_1.  Thus, for these subjects, we don't know whether or not they would have experienced Event_1 before the end of the study.  We also don't know the corresponding time to event (possibly censored).

    My questions are:  Is it possible, just from the information provided, to impute the missing values for the indicator variable Indicator_1 and the corresponding time variable Time_1? If it is, how can this be done (in R)?  If it isn't, what other options should I consider for analyzing these data?  (I do have access to some covariates such as age, sex, etc.)

    Any comments, references or idea would be greatly appreciated. 

    Thank you in advance for your help!

    Isabella

    P.S.  Many thanks to Alan Forsythe for pointing out that there was a typo in the definition of Indicator_2. This indicator should be defined in relation to subjects experiencing (or not) Event_2, rather than Event_1.  

    ------------------------------
    Isabella R. Ghement, Ph.D.
    Ghement Statistical Consulting Company Ltd.
    E-mail: Isabella@ghement.ca
    Phone: 604-767-1250
    ------------------------------


  • 2.  RE: Missing value imputation in a competing risks setting

    Posted 01-15-2019 04:01
    Hi,

    Responses for these indicators are "Binary"; means, data not numerical. Therefore data impute may not be a good idea. It is better if you can omit that 10%, for the reliability of output.

    ------------------------------
    W.G. Samanthi Konarasinghe( PhD)
    Statistician /Senior Lecturer
    Institute of Mathematics & Management,
    Sri Lanka
    ------------------------------



  • 3.  RE: Missing value imputation in a competing risks setting

    Posted 01-15-2019 04:43
    Hello Isabella,

    I am not an expert in survival analysis, but, from my reading of your question, it seems like imputation should be feasible. My first thought is to use some type of sequential regression imputation wherein the incomplete indicator variable is imputed using binary logistic regression and the incomplete time-to-event variable is imputed using predictive mean matching (PMM). This approach could be implemented using the mice package in R.

    Hopefully, the rationale for imputing the indicator variable with logistic regression is obvious. As far as the survival times go, their distribution is probably substantially non-normal, which limits your options when imputing with canned software packages. As a donor-based method, however, PMM should automatically accommodate the skewed survival times by replacing the missing values with observed survival times from matched cases.

    Since you have relatively small amount of missing data, and what looks to be a good pool of predictors for the imputation models, I would anticipate imputation working pretty well for your problem. Van Buuren's 2012 book Flexible Imputation of Missing Data contains a thorough discussion of the mice software and its capabilities as well as the use of predictive mean matching for imputing skewed continuous variables. Both Don Rubin and Rod Little have written several papers developing and evaluating the PMM method.

    Regards,
    Kyle Lang

    ------------------------------
    Kyle M. Lang
    Assistant Professor
    Dept. of Methodology & Statistics
    Tilburg University
    ------------------------------



  • 4.  RE: Missing value imputation in a competing risks setting

    Posted 01-16-2019 14:55
    Hi Kyle, 

    Thank you so much for taking the time to answer my question - I really appreciate it!

    In case you wanted to check this, I provided a few more details about the study in my answer to Jonathan's questions. 

    Do I understand your answer correctly that you would first impute the incomplete indicator variable Indicator_1 for the event of interest Event_1, and then, once that is imputed, you would impute the time to event variable Time_1?   (I am interpreting sequential regression imputation in a "first, then" kind of way.) 

    If the imputation of the indicator variable Indicator_1 can be performed first (independently of the imputation of Time_1), then should that imputation use information on Indicator_2 and Time_2 corresponding to the competing event (in addition, perhaps, to using information on the available covariates such as Age at Diagnosis,  Gender)?  In particular, we know that if study subjects have experienced Event_2 (Death), they can't experience Event_1 beyond Time_2, though it is of course possible that they could have experienced Event_1 prior to Time_2.  

    Furthermore, when imputing Time_1, should we use the (now imputed) Indicator_1 in our imputation model for a skewed response variable, along with Indicator_2 and Time_2 and the available covariates? 

    I just want to make sure I am understanding your suggestions correctly and also am not trying to mis-apply them.  If you could confirm whether I am on the right path, that would be great! 

    Many thanks, 

    Isabella

    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.
    ------------------------------



  • 5.  RE: Missing value imputation in a competing risks setting

    Posted 01-17-2019 08:36
    Hello Isabella,

    No worries; I'm happy to offer what advice I can. I think your understanding of my recommendation is pretty much correct. Below, I've added some further thoughts relating to your follow-up questions and your responses to Jonathan's questions.

    If I am correctly understanding your answers to Jonathan's questions, then it seems like the Event 1 indicator variable and the Event 1 time-to-event variable are always observed together or missing together (for a given patient). In that case, you would not want to use the imputed indicator variable as a predictor when imputing the time-to-event variable. If you used the imputed indicator variable as a predictor for the time-to-event imputation, the information used to generate the imputations would be entirely implied by the covariates (i.e., the variables used to generate the imputations of the indicator variable), so you gain nothing over imputing time-to-event with the covariates as the only predictors.

    For the same reasons give above, it doesn't matter whether you impute the indicator variable before or after the time-to-event variable (since the imputed values of one will not influence the imputations of the other).

    As noted above, I wouldn't recommend using the Event 1 indicator variable to impute the Event 1 time-to-event variable (or visa-<g class="gr_ gr_5547 gr-alert gr_spell gr_inline_cards gr_disable_anim_appear ContextualSpelling" id="5547" data-gr-id="5547">versa</g>), but I would suggest using both Event 2 variables as well as your covariates to impute both Event 1 variables. Since the two events are tied by a known relational rule (i.e., a patient cannot experience Event 1 after death), the Event 2 variables should inform the imputation of the Event 1 variables.

    In terms of enforcing the know relational rules (since the imputation model won't get it right every time), the most pragmatic solution is probably to impute first and edit the data to conform to the rules afterward. That is, after imputing the indicator and time-to-event variables, any patient who experiences Event 2 before the time that the imputations suggest they experienced Event 1 would have their imputed indicator and time-to-event data deleted. If the imputations are violating the relational rule frequently (i.e., a subjectively defined proportion of violations that makes you uncomfortable), then you may want to question the validity of the imputations. If your imputation model often fails to maintain known relations among the variables, then the model is doing a poor job of learning the structure of the data, so the imputations may be poor representations of the true values.

    Regards,
    Kyle

    ------------------------------
    Kyle M. Lang
    Assistant Professor
    Dept. of Methodology & Statistics
    Tilburg University
    ------------------------------



  • 6.  RE: Missing value imputation in a competing risks setting

    Posted 01-17-2019 18:00
    Thanks so much, Kyle!  This is very useful and I am learning a lot from your answers.  (: 

    Further digging into this project revealed the following information:  

    1. There were 102 patients in total.
    2. 16 of these patients had no abnormalities prior to transplant, so they weren't tested for abnormalities post-transplant via the FISH test.  These patients can be safely excluded from the analysis, leaving 102 - 16 = 86 patients for further consideration.  (The event of interest - Event_1, is clearance of abnormalities post-transplant.
    3. Of the 86 remaining patients, 12 had their transplant in 2004, prior to the availability of the FISH test, so it was impossible for them to have been tested for abnormalities pre-transplant.  If they weren't tested pre-transplant and established to have abnormalities, it was also impossible for them to have been checked for clearance of abnormalities post-transplant (assuming the post-transplant test would have occurred when the FISH test did become available). 
    4. A further 3 patients among the 86 patients mentioned above did not have a FISH test pre-transplant for whatever reason, even though their transplant occurred after the FISH test became available.    

    I am inclined to exclude the 12 patients from consideration under the guise that it would have been impossible to assess their event of interest (i.e., clearance of abnormalities) around the time when they had their transplant since the FISH test did not exist at that time, leaving me with 86 - 12 = 74 patients.  Of the 74 patients, 3 have missing values on Event_1 and Time_1.  This is a relatively small percentage of data missingness, namely 4.1%, so I would just analyze the data for the 74 - 3 = 71 patients without missing values on Event_1 and Time_1.  But is this a reasonable approach? 

    Or should I impute the missing Event_1 and Time_1 values using the methods you suggested, but making sure to add an indicator variable for when the patient had their transplant in the imputation models being used: before 2004 and after 2004?  It seems like missingness comes in two flavours here - it is informative for the 12 patients in item 3. above but uninformative for the 3 patients in item 4. above (?).  I think the approach you suggested would assume uninformative missingness for all patients with missing Event_1 and Time_1 values (?). 

    I don't want to complicate things but want to make sure that whatever solution I choose is sensible given the information listed in items 1. - 4. above.  The simpler the solution, the better, but beggars can't be choosers.   

    Thanks again, 

    Isabella

    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.
    ------------------------------



  • 7.  RE: Missing value imputation in a competing risks setting

    Posted 01-21-2019 05:02
    Hello Isabella,

    I think your rationale for excluding the 12 patients who had their transplant prior to 2004 is reasonable, but the justification for excluding the remaining three patients is less sound. The important thing to keep in mind whenever removing cases (for whatever reason) is that doing so usually alters the population to which you can generalize your inferences. For example, when you exclude everyone who had their transplant before 2004, you can only generalize to patients who receive a transplant after 2004. Since you don't know why the three remaining patients have missing measurements, you also don't know how excluding them will affect the generalizability of your inferences. That being said, since we're talking about less than five percent of the sample, the impact on your inferences will probably be minimal.

    To be on the safest side, I would recommend running the analysis both ways (i.e., once with the 15 cases excluded and once with their missing data imputed as we've been discussing). If the imputations seem to be behaving in a reasonable way, then differences in the two sets of results can tell you something about the different outcomes expected in the original and restricted populations. If the imputations don't behave, you could interpret only the restricted results while acknowledging any potential limits to generalizability.

    Regards,
    Kyle

    ------------------------------
    Kyle M. Lang
    Assistant Professor
    Dept. of Methodology & Statistics
    Tilburg University
    ------------------------------



  • 8.  RE: Missing value imputation in a competing risks setting

    Posted 01-16-2019 09:28
    I have a couple of questions about this.

    1. How do we know when the events occurred? Can we tell an exact date by looking once at the end of the study, or do we have to assess periodically as the study is going on?
    2. If periodically, what was the duration of the study and how frequently were the assessments made?
    3. How did the missing patients come to get assessed for one event but not the other (possibly at every assessment)? 
    4. Was there a particular individual or site who did this (or other systematic cause) or was the missing data distributed more or less randomly?
    5. Is there an event hierarchy here? Can one event precede the other but not vice versa? 
    6. If you have an imputation method you can really believe in, why wouldn't you also use it to impute the event date for patients whose events hadn't occurred by the end of the study? Why just use it for the patients with no event data at all?

    Note: Censoring at day 1 (or, if a subsequent event in an event hierarchy, the day after the preceding event or end of study if the preceding event never occurs) would be the traditional method if non-informativity of missingness assumptions can be justified. 




    ------------------------------
    Jonathan Siegel
    Deputy Director Clinical Statistics
    ------------------------------



  • 9.  RE: Missing value imputation in a competing risks setting

    Posted 01-16-2019 14:43
    Thank you for your excellent questions, Jonathan!  Please see my answers below.

    Question 1, 2 and 3

    The subjects in this study are patients who received a stem cell transplant (SCT).  

    The event of interest (Event_1) is clearance of FISH abnormalities post-SCT.  Patients were tested for (genetic) abnormalities once sometime prior to the transplant and once sometime after the transplant using the FISH test.  If the abnormalities present pre-SCT were found to have cleared post-SCT, the patients were deemed to have experienced Event_1.  The date when the clearance occurred post-SCT (if it did occur) is known for all patients who received the FISH test pre- and post-SCT.  However, some patients received the FISH test pre-SCT but not post-SCT, which means that, for those patients, we do not know whether or not they would have experienced clearance of abnormalities.  The time when they would have experienced clearance is also not known. 

    The competing event (Event_2) is  Death.   We do know, for all patients of interest, whether or not they died by the end of the study.  For those who died, we know their exact death date in relation to the date of the transplant.  Patients were monitored for several years (up to about 14-15 years for some patients).

    So there is no periodic assessment of when the clearance might have occurred - this is assessed just once during the study period, though how the clinicians decide when a patient should be assessed and why I do not know.  Whether they (i) deemed the patient didn't warrant follow-up testing for clearance of abnormalities based on clinical indications, (ii) forgot to administer the test post-SCT even though they did administer it pre-SCT or (iii) administered the test post-SCT but didn't record the result - there really is no such information available. 

    Even for patients who were tested for clearance post-SCT, I guess it is conceivable (?) that, if a patient would have been tested at a later date than the actual date when they were initially tested  without achieving clearance, they might have achieved clearance at the subsequent test date despite not achieving clearance at the earlier date.

    Question 4: Was there a particular individual or site who did this (or other systematic cause) or was the missing data distributed more or less randomly?

    It is my understanding that these patients all come from a single cancer agency.  I can ask whether all of these patients were seen by the same doctor.  Not sure whether any other systematic cause would have been responsible for these patients not being tested for clearance post-SCT.

    Question 5: Is there an event hierarchy here? Can one event precede the other but not vice versa?

    Patients can  experience post-SCT clearance first and then die.  Or they can die before having a chance to experience clearance, in which case death precludes clearance.  So Event_1 (Clearance) can precede Event_2 (Death), and if Event_2 occurs, Event_1 can no longer occur.   

    Question 6: If you have an imputation method you can really believe in, why wouldn't you also use it to impute the event date for patients whose events hadn't occurred by the end of the study? Why just use it for the patients with no event data at all?

    I don't really have an imputation method I can really believe in, hence my question on this forum.  (: 

    For patients who were tested for clearance post-SCT but were found not to have cleared their abnormalities, I guess I am making the implicit assumption that they would have maintained the same non-clearance status until the end of the study or death, whichever comes first.  Maybe this assumption is justified by the fact that they would have been tested for clearance just once or maybe it is not.  

    For patients who were NOT tested for clearance post-SCT, my predicament is that I do not know which of them would have rendered a positive test for clearance of abnormalities and which of them would have rendered a negative test post-SCT.  I can assume the two extremes: all would have rendered a positive test and all would have rendered a negative test.  This way, I would fill in the values of the Indicator_1 variable with all 1's or all 0's and only worry about imputing the time to event.  But I still don't know what the best way to do that is, given that I also have information about Indicator_2 and Time_2, as well as some covariates.  What about the situation where some patients render a positive test and the other a negative test? 


       







    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.
    ------------------------------



  • 10.  RE: Missing value imputation in a competing risks setting

    Posted 01-18-2019 11:30

    Dear Isabella,

    Based on your answers, I wanted to suggest that the FISH endpoint doesn't really meet the requirements for a time-to-event endpoint. You might want to consider using a binomial (or logistic) approach instead. The death endpoint is time-based, but not the FISH endpoint. 

    The problem here is that the FISH assessment takes place exactly once after SCT, at an unknown time, which might be considerably less than the time assessed until death. In addition to not really knowing what time the FISH assessment occurred, you explained that you don't really care, because you are willing to assume that if the patient has no  FISH abnormalities by the FISH assessment, they will never happen. (This is why you are willing to censor at date of study end, rather than at first study day, which is the last day you know the patient's status). 

    If time doesn't matter, then we don't have a time to event endpoint at all. I particularly wouldn't censor at end of study, as the assumption that once first evaluated, whenever first evaluated, the hazard is thereafter 0 is particularly inconsistent with TTE assumptions. I think, on reflection, it is really more consistent with a binomial with all hazard occurring at an initial point mass and none thereafter.  If you both don't really know what time it is, and you don't really care about time, you don't have a time-based endpoint. 

    For this reason, I wouldn't use a composite TTE approach like competing risks at all. 

    Even if we knew the time of the FISH assessment, I would truncate any composite analysis at the point of that assessment. Once the single FISH assessment occurs, the proverbial diamond watch stops cold dead. The study may give the patients time enough to die. But it will never again give them another moment in time to see if their FISH status changes. So we can't draw any inferences about FISH, or any composite inferences, beyond the point at which FISH stops being systematically assessed. 



    ------------------------------
    Jonathan Siegel
    Deputy Director Clinical Statistics
    ------------------------------



  • 11.  RE: Missing value imputation in a competing risks setting

    Posted 01-18-2019 17:35
    Hi Alessandro, 

    I don't know that you would need to justify the time - as a professional, all you can do is give your honest assessment of what you think it might take in terms of estimated time and budget to complete a project. 

    If you think, based on your prior experience, that a project would take you 10 - 20 hours to complete (or 40 - 80 hours, etc.), then you can let the potential client know that and they can decide whether that time range will work for them - assuming you charge an hourly rate.  In general, the more statistical analyses a project involves, the more time it will take to implement them.  If formal reporting is necessary, that alone can add 8-16 hours or more to the tally of hours.  Communications by phone with the client can also add up.   

    Most reasonable clients would tell you whether or not they can afford to cover the upper limit of the time estimate you provided.  If they think the upper time bound is too high, that will be an incentive for both you and them to prioritize what needs to be done, so that the most important aspects of the analysis will be covered.  Or for the client to use you sparingly only for those parts of the analyses that need your high-level input.  

    It's always a good practice to allow for a time and budget contingency (e.g., 10% - 20%), especially for large problems, where the project scope can shift over time.  This way, both you and the client will have some flexibility to work with.  If you make it clear that you charge only for the time you actually spend on the project, the client will feel comfortable that you won't eat away that contingency budget just because it is there.  

    In my practice, I noticed that it is much easier to secure funding at the inception of a project than once the project nears completion.  You can advise your clients that, if their budget runs out prior to budget completion, you'd be happy to help them with wrapping up the project by giving them 1-2 hours for a small project or a bit more for a large project - especially if those are long standing clients.   This should be done when the project is about three quarters of the way through.  This way, clients won't expect unlimited free work from you.  You can also emphasise when you provide the initial estimates of time and budget that these are estimates and the actual times may vary depending on any unexpected issues you will encounter with the data, statistical programming or statistical methodology.  In that case, you can re-assess priorities and work with the client to make sure the key priorities get addressed.  

    There are situations where the client won't budge and will expect you to work for free once their budget runs out.  You can still work with the client to give them something but do try as much as possible not to allow yourself to be wrestled into a corner and end up doing half of the work for free.  Setting some clear boundaries in terms of what you can and cannot accommodate as a professional will be helpful.  It is important though to work with the client from the same side of the table - that will stand you in good stead.  But educating the client in a respectful way is necessary in these situations.  

    To sum up, all you can do is provide an estimate of what you think is doable for a given project.  That estimate is your professional, reasonable opinion.  Another professional might give a different estimate, since they will have a different level of training, expertise, skillset, availability, etc.  Over-estimating slightly the time involved is better than under-estimating it - to that end, having a time contingency will come in handy.  Allowing sufficient time for communication and reporting will also help.  In terms of analyses, try to think about how many analyses are involved and whether they are simple or complex in nature.  You can estimate that a simple analysis might take you 4-6 hours, say, whereas a complex one might take you 8-16 hours.  Then scale those numbers up depending on the overall number of analyses and see where that takes you.  In general, we tend to under-estimate how long analyses will take, so you can pad your estimates up slightly to account for that (perhaps unjustified) optimism.      

    Estimates should be provided to the client once you have a clear understanding of all the tasks involved.  Sometimes the tasks are not clear and just clarifying them would take a lot of time.  If that is the case, you can negotiate some initial time to cover the task clarification task if that's feasible.  When you actively engage the client in prioritizing the most important tasks, that also communicates the message that some lower-level priority tasks might not be addressed if the budget runs out.  

    Of course, every consultant goes about this process differently.  But no matter what route you choose, it helps to adopt the mindset "this is what I can do for you from an objective, professional standpoint and these are the conditions under which I think I can do it".  If you don't think you can do it or if the conditions are not reasonable for you or the client, it's only fair to the client to let them know and recommend someone else who would be a better fit.  As a professional, it's important to set yourself up for success, because then your client will be successful too! 

    If you have more specific questions about this, please post them on this forum so that we can provide you with further guidance. 
     
    Disclaimer:  I am an independent statistical consultant.

    Best regards, 

    Isabella 


    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.
    ------------------------------