ASA Connect

 View Only
  • 1.  Cross Validation for Logistic Regression

    Posted 04-20-2015 11:48
    This message has been cross posted to the following eGroups: Young Professionals Group and ASA Connect .
    -------------------------------------------

    Hi. Does anyone know how to do a cross validation for logistic regression on SAS? Thanks!

    ------------------------------
    Tai Yean Teh
    Oklahoma State University
    alvin.teh@okstate.edu

    ------------------------------



  • 2.  RE: Cross Validation for Logistic Regression

    Posted 04-21-2015 08:31

    Cross-validation can be done in SAS Proc Logistic by outputting the estimated phats and comparing them to whether the event occurred.  I.E., if phat>.5 and BinaryOutcome=1, then you have agreement.  Also, use the crossvalidate option in the example below. 

    ods html path="c:\temp"; ods graphics on;
    Proc Logistic Descending Data=PredictiveModelData outest=betas covout;
    Class ClassVar;
    model BinaryOutcome =ClassVar X1-X5
        /iplots link=logit CLOdds=Both rsquare;
    output out=pred p=phat lower=lcl upper=ucl
              predprob=(individual crossvalidate);
    run;
    ods graphics off; ods html close;

    Hope this helps.

    ------------------------------
    Brandy Sinco
    Research Associate
    ------------------------------




  • 3.  RE: Cross Validation for Logistic Regression

    Posted 04-21-2015 10:12

    Perhaps the easiest way to obtain approximate leave-one-out (LOO) cross-validated predicted probabilities in proc logistic is using the Output statement with the PredProbs=(Xvalidate) option.

    http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_logistic_syntax27.htm

    HOWEVER, note that this assumes the model is something fixed. If you do some predictor selection prior to fitting a model, then the resulting LOO cross-validated predicted probabilities from this approach will be over-optimistic. Unbiased cross- validated estimates require cross-validation of the whole process (predictor selection and model fitting) and not just model fitting.

    Now, if this is what you are doing, then another approach is to randomly split the dataset into a few partitions, say 4 (i.e., 4-fold cross-validation). You can do this adding a random variable to the dataset. 

     http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p0fpeei0opypg8n1b06qe4r040lv.htm

    And then conduct the whole process of predictor selection and model fitting 4 times, at each time using 3 folds for predictor selection and model fitting, and the remaining fold to obtain predicted probabilities on data not used in the modeling process. This way, each case is used for both model fitting and independent validation. Because the partition is random, you may expect different results if you start from different partitions, so you can do the 4-fold cross-validation a few times using different random partitions, and for each case, average across the sets of 4-fold cross-validated predicted probabilities.

    to store a fitted model and apply it to a different dataset you can use the Store option in proc logistic, combined with proc pml.

    http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_logistic_syntax33.htm

    http://support.sas.com/documentation/cdl/en/statug/67523/HTML/default/viewer.htm#statug_plm_examples01.htm  

     

     

    ------------------------------
    Andres Azuero

    University of Alabama at Birmingham
    ------------------------------




  • 4.  RE: Cross Validation for Logistic Regression

    Posted 04-22-2015 10:00
    A better way is divide the data set into a training set and a validation set.  Train your model and then see how well it predicts the outcome of the observations in the validation set.

    ------------------------------
    Aubrey Magoun
    Consultant
    Applied Research & Analysis, Inc.
    ------------------------------




  • 5.  RE: Cross Validation for Logistic Regression

    Posted 04-23-2015 03:22
    Hi.  Since you did not mention what type of cross validation you want to do, I would suggest reading the appropriate chapter in Frank Harrell, Regression Modeling Strategies.  All the choices are clearly explained and Harrell has developed software for all of them.  The drawback may be that they are written in R and S not SAS. I would suggest using Harrell's packages because they are very good.  By the way, the variable selection comment above is well taken.  If you use variable selection, i would recommend adaptive lasso(Google Boos Adaptive lasso R for a website that gives programs and explanations).  As you might guess, I would suggest going to R for this rather than SAS because everything you might want is available in R and it's easy to use. Best of luck.

    ------------------------------
    David Booth
    Professor Emeritus
    Kent State University
    ------------------------------




  • 6.  RE: Cross Validation for Logistic Regression

    Posted 04-23-2015 11:00


    ------------------------------
    Bruce Lund
    ------------------------------