Discussion: View Thread

  • 1.  modeling a binomial outcome measured multiple days

    Posted 05-06-2011 15:17

    I am an independent consultant and have a complicated dataset.  I appreciate your ideas on how to approach this problem. 

    The purpose of the study is to describe the factors associated with blood transfusion and to determine their relative importance in a cohort of patients with a certain condition. The investigators are interested in developing a model that predicted the physicians' decision to transfuse.  The results of the study could be used to modify hospital transfusion practices.

    This is a retrospective study, post hoc / secondary analysis using a dataset that was collected for a different multicenter study.

    The outcome the investigators want to model is whether or not a patient was transfused on any given day during their hospital stay.  Transfusion (yes/no) is measured daily until the patient is discharged. 

    Variables collected include one-time measurements (e.g.,  demographics, baseline lab values, other conditions)  as well as daily measurements (e.g., lab values).  Most lab values were measured on days 1 - 7, but not all (some are measured days 1-4 and day 7). 

    The dataset includes about 1000 patients.  These patients collectively had 502 transfusions out of 6496 patient days.

    The analysis are complicated by several considerations:

    • Multiple measures per patient.  If patient-day is the unit of observation, then days from the same patient are correlated.
    • Patients can have days of transfusion as well as days without transfusion.
    • Data are censored; patients contribute different numbers of days to the model.
    • Time series - The decision to transfuse on a given day may be dependent on whether or not the patient was transfused on the previous day (s).
    • Data are censored for various reasons - discharge, death, other complicating factor. Patients contribute different numbers of days.
    • Some factors are measured only once, while others are measured daily.
    • Data are not missing at random.  Data are usually missing for clinically-related reasons.  Results are missing on a given day because the physician did not think that it was necessary to order the test or to record that particular variable on a particular day.  The model would need to include the fact that a test was ordered or not ordered.  If the test was ordered, then  the model would need to include the result of the test (which is usually a continuous variable, e.g., serum creatinine could take any value > 0 mg/dL or be not done).  If the test was not ordered, then this effect may also be important to model.

    -------------------------------------------
    Nancy Buderer, MS
    Biostatistician & Research Consultant
    nancy@budererdrug.com
    -------------------------------------------


  • 2.  RE:modeling a binomial outcome measured multiple days

    Posted 05-06-2011 18:56

    One of two approaches is typically used for this type of situation: generalized estimating equations, or generalized mixed linear models. GEE generates only marginal (i.e., not conditioned on subject ID) estimates but is arguably a little simpler to implement. Depending on software you can probably model an autoregressive intra-subject or similar process. Both methods support unbalanced data (missing, censored etc.) in a missing-at-random context.


    -------------------------------------------
    Jeremy Weedon
    Asst. Clinical Professor (biostatistician)
    SUNY/Downstate Medical Center
    -------------------------------------------








  • 3.  RE:modeling a binomial outcome measured multiple days

    Posted 05-08-2011 11:37
    I have not done such an analysis. I checked the documentation for version 19. It appears that the GENLIN procedure in SPSS would do at least some of what you want.  It does handle repeated measures with different numbers of repeats.  It also has several kinds of "links".

    Does your variable about testing have 3 values 1) ordered 2) not ordered 3) don' t know?

    I suggest you think about whether there are a series of questions and whether it is necessary that one model answer all of the questions.

    Two very experienced colleagues who are not on this list had some reactions when I forwarded the post to them:
    Bruce Weaver said.
    The data Nancy describes have a multilevel structure, with daily data at level 1 and "one-time" (or patient level) data at level 2.  I'm still on v18, but I believe the new multilevel GENLIN procedure in v19 can perform multilevel regression with a binomial error distribution and a variety of link functions (e.g., logit link if you want mulilevel logistic and odds ratios, or an identity link function if you want risk differences, etc). Someone who has v19 may be able to comment further.
    Rich Ulrich said.
     
    I'll offer a massive simplification of the statistical problem. I notice that there are 1000 patients, and 502 transfusion events. Since some patients had more than one, there are fewer than 500 patients with a transfusion....  Presumably the others were selected by criteria which do need to be noted. The model should be divided into two questions -  "Transfusion: Yes/no"; and "When".  Or perhaps the main interest will be satisfied by the first question alone.  Or, the answers to the first question should be the starting point for looking at the second question. The question of "When"  might be simplified, also, by deciding to model the occurrence of "First transfusion".  Whether it is worth looking at additional transfusions could depend on the amount of data available. 


    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 4.  RE:modeling a binomial outcome measured multiple days

    Posted 05-09-2011 07:06
    Another colleague has confirmed that GENLIN is likely to be usable for some of your analysis.  If you have SPSS, tweak the simulation so that it more closely fits your situation.
    Rayan Black said.
    Art,
     
    A GEE/Generalized Linear Model can be fit employing the GENLIN procedure in SPSS. GENLIN is capable of fitting various generalized linear models that account for correlation among repeated measures.
     
    I provide a simple example below which assumes data are derived from a logistic regression equation with correlation among repeated observations. Note that data for 1000 subjects are generated, all of whom are measured 50 times. Also note that a time-varying covariate, x1, was incorporated into the model.
     
    **I wrote the code below very fast. Apologies if there are any typos.
     
    HTH,
     
    Ryan
     
    *Generate Data.
    set seed 98765432.
     
    new file.
    inp pro.
     
    comp ID = -99.
    comp x1 = -99.
    comp b0 = -99.
    comp b1 = -99.
    comp rand_eff = -99.
    comp time = -99.
     
    leave ID to time.
     
     loop ID = 1 to 1000.
        comp b0 = -0.5.
        comp b1 = 1.0.
        comp rand_eff = sqrt(.3)*rv.normal(0,1).
     
        loop time = 1 to 50.
           comp x1 = rv.normal(2,1).
           comp eta  = b0 + b1*x1 + rand_eff.
           comp p = exp(eta) / (1 + exp(eta)).
           comp y = rv.bernoulli(p).
     
        end case.
      end loop.
     end loop.

    end file.
    end inp pro.

    exe.
     
    Delete variables b0 b1 rand_eff eta p.
     
    GENLIN y (REFERENCE=FIRST) WITH x1
      /MODEL x1 INTERCEPT=YES
     DISTRIBUTION=BINOMIAL LINK=LOGIT
      /REPEATED SUBJECT=ID WITHINSUBJECT=time SORT=YES CORRTYPE=EXCHANGEABLE ADJUSTCORR=YES
        COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
      /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.



    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 5.  RE:modeling a binomial outcome measured multiple days

    Posted 05-09-2011 10:33
    Hi Art,

    GENLIN indeed implements GEE estimation, but GEE assumes the missing data mechanism is MCAR, and Nancy's indicated that she thinks it's NMAR, so this would seem pretty risky. I'm not aware of a full-scale way of handling NMAR data, or at least not simply via the application of a single modeling procedure in SPSS or other software.

    Dave

    -------------------------------------------
    David Nichols
    IBM Corporation
    -------------------------------------------








  • 6.  RE:modeling a binomial outcome measured multiple days

    Posted 05-09-2011 10:57

    I agree with David.  SAS, STATA, SPSS and more all do GEE/Generalized linear models.  About everything you want to know on the computing side can be found in Hardin and Hilbe.  But if the missing data is a serious problem and missingness is non-ignorable (not MCAR or MAR) then the key issue is how to handle the missing data. I think there is a good reason why non-ignorable missingness is not available in general-purpose software packages.  It is because the mechanism for the missingness is critical and must be modeled.   This is not necessarily an insurmountable obstacle.  To claim confidently that the missingness is not MAR requires some idea of what makes the missingness non-random.  For example in clinical trials patient dropout is often due to ineffectiveness of the treatment or a serious drug-related adverse event.  Those may be linked to patient demographic characteristics and could possibly be mo0delled that way if cause of dropout can be associated with a particular characteristic, such as age  or concomitant medications.  This means thinking like a statistician and not just hunting for software to match the problem to a solution.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 7.  RE:modeling a binomial outcome measured multiple days

    Posted 05-09-2011 11:22

    I'm getting a bit off the thread here, but like to chime in on the MCAR and MAR.  As an aging practitioner who is becoming more cognizant of practical limitations, I think we should leave the concept of MCAR for textbooks.  And as for MAR, it has been my experience that with large enough studies one can often find a few covariates within cross-classifications of which missingness can be considered as random as resources allow.  I don't mean to advocate sloppy practices, however, sometimes elegance of theory may have to take a backseat to ground realities. 


    -------------------------------------------
    Mansour Fahimi
    VP, Statistical Research Services
    Marketing Systems Group
    -------------------------------------------







  • 8.  RE:modeling a binomial outcome measured multiple days

    Posted 05-09-2011 11:31
    I agree. I said "likely to be usable for some of your analysis".  I had earlier asked whether  all questions had to be answered in a single model.  I meant that some incomplete but useful information might be found.

    It is certainly crucial to have the subject matter part of the team describe the different reasons why values might be missing.

    Sometimes we have to go back to a client and discuss that we cannot form a particular conclusion, but can only do things that answer questions parts a, e, and f, but not b, c, d and g.


    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------