Discussion: View Thread

NLMIXED and GEE for binary response not converging

  • 1.  NLMIXED and GEE for binary response not converging

    Posted 03-24-2012 09:48
    Dear All,

    I am fitting various binary response models to a data set consisting of 566 patients.

    The response  (141 one's and 422 zeros) is binary variable and I have a mixture of continuous and categorical predictors.

    When I fit the logistic regression model it gives me the following results:

    Analysis of Maximum Likelihood Estimates

    Parameter

    DF

    Estimate

    Standard
    Error

    Wald
    Chi-Square

    Pr > ChiSq

    Intercept

    1

    1.9440

    0.7804

    6.2053

    0.0127

    GENDER

    F

    1

    -0.3151

    0.2289

    1.8950

    0.1686

    gr_revenu

    High_SES

    1

    -1.3553

    0.7477

    3.2858

    0.0699

    gr_revenu

    Low_SES

    1

    0.6459

    0.2315

    7.7823

    0.0053

    Smoking_Status

    NO

    1

    -0.9543

    0.2898

    10.8437

    0.0010

    XX1

    1

    0.3949

    0.2794

    1.9981

    0.1575

    X_1

    1

    -0.1350

    0.0316

    18.3136

    <.0001

    X_2

    1

    0.0131

    0.00919

    2.0390

    0.1533

    X_3

    1

    -0.1488

    0.1168

    1.6225

    0.2027



    Hosmer and Lemeshow Goodness-of-Fit Test

    Chi-Square

    DF

    Pr > ChiSq

    12.2010

    8

    0.1425


    The model's  sensitivity 27.7% and  it's  specificity 93.6%. Area under the ROC curve = 0.7381. Also, we looked at the influence plots and there we did not detect  any outlier or influential observations.

    However, since I am expecting some sort  (exchangeable) of correlation among the response of patients who were treated by same physician I want to find an estimate of this correlation and if possible test it's significance statistically. 



    I  tried  to fit the NLMixed  procedure  so that I can use the estimate of variance of random effect for estimating the intracluster correlation coefficient, but the model doesn't converge. 

    I also, tried to fit  the GEE model however, I keep getting the following error " the number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate"

    I am guessing too many predictors(over fitted model) is the reason. I would really appreciate your comments and suggestions on it. 

    Thank you.
    Tasneem



    -------------------------------------------
    [Tasneem] [Zaihra]
    [Assistant Professor]
    [Concordia University]
    [Montreal]
    [QC]
    [Canada]
    -------------------------------------------


  • 2.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-24-2012 10:08
    It's hard to say without seeing the code for the GEE; you should be using the compound symmetry correlation model; the model where you estimate all the parameters of the correlation matrix often doesn't converge - but usually with a different error message.  This sounds a little more like you either have a lot of singleton groups or you don't have your grouping (clusters) variable defined correctly.

    I haven't used NLMIXED very much, but we had to be very careful in our model and variance specification to get convergence.  For your outcomes, proc glmmix would be a much better choice than nlmixed, but you ought to be able to get what you are looking for quite easily using proc genmod to get your GEE.

    Ray

    -------------------------------------------
    Raymond Hoffmann
    Professor
    Medical College of Wisconsin
    -------------------------------------------








  • 3.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-24-2012 11:28
    I would be interested in knowing more details about the problem.  That would help us advise you.  For instance what is this binary response variable and what do some of these variables mean.  Gender and smoking status are obvious but the others are not.  Also I should emphasize that a singularity condition is worse than overfitting. You can fit too many variables and still have a unique solution.  It is just that the estimates would be highly variable and the model might not be very reliable.  But in your situation the model is so overparameterized that there is not a unique solution.  So the algorithm cannot converge to a solution.  Looking at the logistic regression fit it looks like you could also drop some parameters such as gender and X2 and X3.  But not knowing what some of these variables are and with no context for why they are included I cannot know if there is a clinical reason to include them. 

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 4.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-24-2012 11:34
    I agree that the error message is suggesting a more structured specification for the correlation matrix.  I think it is pretty clear that the error message is telling you that there is a singularity condition which means that there is not a unique solution and hence the algorithm won't converge.  With compound symmetry or an AR(1) structure you probably won't have this difficulty. If you know that compound symmetry is the appropriate structure then use it.  Otherwise I would try a few like Toeplitz or AR(1) to see if there is any sensitivity to the correlation structure selected.  Usually there is not. 

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 5.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-24-2012 21:24
    I'm confused.  What is the random effect?  Is this a case of repeated measures on the 566 subjects and the logistic regression is based on a single observation for each subject?  Is it that the subjects are clustered somehow together?  I go back to the first questions that have already been asked.  Tell us more about the problem and share with us the code used to model the random effects.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 6.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 10:40
    Dear All,

    Thank you for all your comments and suggestions.

    As per Michael's suggestions  I will try to use the other variance structures. However, I still have two questions:
    • We are more interested in subject specific models and I think fitting mixed effect logistic regression is a better answer than GEE model. Would you agree with me on that? If so then my issue of estimating intra cluster corelation as well as testing for the significance of this intra cluster correlation using likelihood ratio to comare fixed effect logistic regression with mixed effect logistic regression. Both these require fitting mixed effect logistic regression and my model is not converging so I am kind of lost ? Any help would be really appreciated.

    • While using PROC NLMIXED  or GEE is there a specific way the clustering variable should be coded. Does coding effects the results. For instance say I have  subject=PHYSICIAN in my PROC NLMIXED option or repeated subject=PHYSICIAN in PROC GEE will it effect my results or will it create issues with convergence if I code PHYSICIAN as (1,2,3,4) or PHYSICIAN as (64055, 65471,56432). If it does effect then how can I change the coding scheme for a variable in SAS. As currently my PHYSICIAN is coded with  5 digit numeric codes (eg physician A has code 64055, physican B has code 65275 and so on). 

      PS: There are 58 physicians (clusters) and the cluster size (number of patients being treated by each physician) varies from 1 to 39.

      Looking forward for your comments and suggesitons.

      Best Regards, 
         Tasneem


    -------------------------------------------
    [Tasneem] [Zaihra]
    [Assistant Professor]
    [Concordia University]
    [Montreal]
    [QC]
    [Canada]
    -------------------------------------------








  • 7.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 12:15

    I can't comment on your second question but will comment on the first.  If the response variable in the GEE is the same binary variable that you are using in the logistic rgeression model I don't see where it makes any difference.  Both are using the covariates to predict the binary outcome.  Maybe the logistic regression model is not converging for the same reason as happened with NLMIX and GEE, not enough data to fit so many parameters (n<p).  The solution would be either to reduce the number of variables in the logistic regression model or just use the GEE model with a structured covariance.  The fixed effects model provides a clue as to which variable to drop, possibly gender, xx1 and x2.
    The number of covariates in the fixed effects model is only 7 with 566 observations.  So it could only be that in the mixed model you are adding too many correlation parameters.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 8.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-28-2012 10:15

    Dear All,

    Thank you for your interest in my problems. I really appreciate all your suggestions and comments. I have few comments as well:

    • Michael, in one of your posts you mentioned "If the response variable in the GEE is the same binary variable that you are using in the logistic regression model I don't see where it makes any difference"
    • The difference is in the interpretation of the parameter estimates. GEE is a mraginal effects model and the parameter estimates are used to draw inferences about population averaged effects, while Mixed effect logistic regression model is a subject specific model and the parameter estimates are used to draw inferences specific to subjects. Depending upon what our study objective is we would fit different models. Does the group  agree with me?
    • Michael and Robert thank you for the suggestion of GEE model with a structured covariance it works when I replace type from unstructured to compound symmetry. Just one more question though, in the output just below the working correlation matrix it specifies Exchangeable working correlation =-0.01757. Is there a way I can test for it's significance? The group was suspecting some correlation between responses of the patients seeing the same physician. Also, I get the following warning:

    • Michael, the code that I sent for PROC NLMIXED had initial parameter estimates commented because I first tried to run the model with initial parameter estimates and when they didn't converge then I tried to see what happens if I use the default starting values and had no luck then too. The initial parameter estimates that I was using were the estimates obtained form logistic regression model.
    PS: There are 10 physicians (clusters) who only have one patient. Thus, I have 10 clusters of size one.

    • Robert when you ask to constrain my correlation matrix in NLN what do you mean? I am sorry but I don't get it. Following is the code I am using
    PROC NLMIXED DATA=abc;
    parms b0=0.1922 b1= 0.4 b2= 0.8 b3=0.370 b4=1.3 b5=0.23 b6=11.97 b7=1 b8=1.4 b9=0.5;
    xb = b0 +u+ .....;
    p = exp(xb)/(1+exp(xb));
      model y ~ binary(p);
      random u ~ normal(0,s2u) subject=PHYSICIAN;
    run;

    Once again, thank you for all your suggestions. They are immensely helpful.

    I look forward for any further comments and suggestions.

    Best Regards,
    Tasneem
    -------------------------------------------
    [Tasneem] [Zaihra]
    [Assistant Professor]
    [Concordia University]
    [Montreal]
    [QC]
    [Canada]
    -------------------------------------------








  • 9.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-28-2012 10:22

    I believe you can testing the correlation using the covtest option in the Proc genmod statement.

    When I first posted, I hadn't looked carefully at your code for NLMIXED.  I looked at it after one of Michael's e-mails and noticed that the code should have worked since you are only including a variance among physicians.  The potential problem that I see is in the distribution with the number of subjects within physician.  If you have many physicians with a single subject, then you might not be able to estimate a variance (e.g., variance among physicians) where you have not true information about mean (each physician's mean). 

    Hope this helps.
    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 10.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 12:56
    Did you use NL MIXED for the mixed effects logistic model?  If so you included a random statement which assumes a multivariate normal distribution for the random effects.  In the random statement you would have have specified the covariance matrix for the random effects.  How many random effects did you use and how many variances and covariances were left to estimate for the normal distribution?

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 11.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 13:16
    I think I know what your problem is with NLMIXED.  In checking through the description in the SAS users manual it appears that you need to specify inital values for all the parameters.  When I look at the random statement in the code you sent to me I see a parameter su2 that is not included in the initial values in the parms statement.  Furthermore it looks like you commented out the entire parms statement which means that you did not specify initial values for any of the parameters.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 12.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 14:11
    I am pretty sure SAS assumes an initial value of either 0 or 1 for any parameters not explicitly specified in a PARMS statement in NLMIXED.

    -------------------------------------------
    Gabriel Farkas
    -------------------------------------------








  • 13.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 14:23

    What you say may very well be true.  That only means that SAS will run the nonlinear algorithm.  It does not mean that the algorithm will converge.  The default starting value may be terrible and the algorithm could fail to converge.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 14.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 15:34

    The SAS nlin manual has a nice description of convergence (or not) and various options.

    http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_nlin_sect023.htm



    -------------------------------------------
    Chris Barker, Ph.D.
    President - San Francisco Bay Area Chapter of the American Statistical Association
    www,barkerstats.com

    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------








  • 15.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-25-2012 23:00
    The problem here is that the subjects are clustered by physician, with each physician having 1-39 subjects.  If you use an unstructured covariance matrix, then you will have trouble estimating the covariance matrix for the following reason.  The covariance matrix that SAS is trying to fit has dimension equal to the largest number of subjects.  However, you will not have sufficient data to estimate all of these variances since only a few physicians (maybe only 1) will have 39 subjects.  You need to use a structured covariance matrix that makes sense here, perhaps a covariance matrix that contains only the variance due to physician.  I believe that you will not have trouble with either GEE or NLIN if you constrain your covariance matrix.

    I hope this helps.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 16.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-26-2012 09:33
    Yes, but she said in her first post that she was trying to fit an exchangeable correlation structure.  And exchangeable is a synonym for compound symmetry in Proc GenMod.    

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------






  • 17.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-26-2012 09:47

    The actual code that I saw used unstructured.  I have the following:

    repeated subject=PHYSICIAN / type=unstr corrw;

    The correct statement here for compound symmetry would be as follows:

    repeated subject=PHYSICIAN / type=cs corrw;

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 18.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-26-2012 11:32
    ooh, yuck, no wonder.  Thank you.

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 19.  RE:NLMIXED and GEE for binary response not converging

    Posted 03-26-2012 10:05

    Yes Eric.  You haven't been following the thread very closely.  Inspite of an intent to use compound symmetry, unstructured was specified in the code. That was changed and the issue resolved.  Now the problem is with the use of NLMIXED where there are only 8 parameters.  Looking at the code I noticed that starting values for the model parameters were not specified and the default starting values might not be good ones.  So that could explain the lack of convergence I think.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------