Discussion: View Thread

  • 1.  Would like some advice on model setup

    Posted 02-14-2011 14:19
    Hi all,

    I was wondering if anyone on this mailing list might be able to give me some advice on a certain problem I've been tasked with. I have a particularly complex and challenging study design, and I'm trying to determine the best way to parameterize and set up the model.

    The study design involves each subject getting a number of measurements repeated over the course of 1 or 2 periods. In Period 1, all subjects are observed the same number of times. At each measurement, the presence/absence of a particular Outcome Variable of interest is obtained (as a binary Yes/No), along with several (continuous) Factors believed to be predictive of the Outcome. Note that the presence of the Outcome at time t does not automatically imply it will be present at time t+1 or later times.

    If a subject has exhibited presence of the Outcome frequently enough in Period 1 to meet a specified threshold, they proceed to Period 2. Otherwise, they are discontinued and don't have any observations in Period 2. For subjects who proceed to Period 2, there is not a set number of observations like there is in Period 1, but a small range of possible numbers of observations. However, even the subject(s) who end(s) up with the most observations in Period 2 will still have far fewer observations than the number of times they were measured in Period 1.

    So, there are two questions I would like to investigate. First, without considering the Outcome at all, I would like to examine if there are any significant differences in the values of any of the Factor_1 through Factor_j, in general, between observations in Period 1 and Period 2, taking into account the multiple observations for the same subject. This might be as simple as a GLM with MANOVA, with each of the Factors as the response and Period as the explanatory? (If I didn't have to account for the multiple observations for the same subject, and there was only 1 observation per subject in each Period, I would probably consider something like a paired t-test.)

    Second, if there are any significant differences in how Factor_1 through Factor_j model the Outcome, between observations in Period 1 and Period 2, again taking into account that there are multiple observations for the same subject. If it were just one period, I think it would probably be fairly straightforward, with something like a logistic regression that has Factor_1 through Factor_j as the explanatory variables, the Outcome Variable as the response, a stratified study design (stratified by subject), and looking at which of the Factors are predictive. However, here I'm not interested in which of the Factors have a significant relationship with the Outcome, but rather what (if any) are the significant differences between Period 1 and Period 2 in how the Factors model the Outcome. In other words, I might get a certain value for the coefficient for Factor_2 based on Period 1, and another value for the coefficient for Factor_2 based on Period 2, and would like to know if the difference in these is meaningful or not. One method I was thinking about was a proportional hazards model, with conditional logistic regression, stratifying on a subject level. Another idea was to setup a GEE, but was having trouble coming up with a model that properly accounted for everything.


    Any advice you might be able to offer would be most appreciated!

    Best Regards,
    Gabriel Farkas




  • 2.  RE:Would like some advice on model setup

    Posted 02-14-2011 14:58


    -------------------------------------------
    Richard Browne
    Texas Scottish Rite Hospital for Children
    -------------------------------------------
    You must think we're smart.









  • 3.  RE:Would like some advice on model setup

    Posted 02-16-2011 14:38

    For the first question:

    I am a big proc mixed fan and do not use glm if I can avoid it.  Not sure if this is something you can use, or I am missing the boat completely:

    If you just want to compare periods for each factor accounting for each of the other factors you could do this for the subjects who have observations in both periods:

    proc mixed;

    class subject period time;

    model factor_k = factor_1 ... factor_k-1 factor_k+1 ... factor_j period time/ddfm=sat;

    random subject;

    repeated/ subject=subject*period type=ar(1);

    run;

     

    This is assuming the times repeated are equal distanced, otherwise you need another type. You need ddfm=sat (I used sat rather than kr because I have yet to see them differ and sat runs faster) option in the model line but may have to adjust using ddf= proper ddfs if the subject covariance parameter is estimated to be 0. 

     

    Another repeated structure is the cs, easiest incorporated like this:

     

    proc mixed;

    class subject period time;

    model factor_k = factor_1 ... factor_k-1 factor_k+1 ... factor_j period time;

    random subject subject*period;

    run;

     

    which is just type=cs in the repeated statement.  You won't need or want the ddfm statement here.

     

    Is it of interest to see if there is a difference in the factors for the subjects who only are in period 1 and those who are in both?

    Best

    Susanne

    -------------------------------------------
    Susanne Aref
    Aref Consulting Group LLC
    -------------------------------------------