Discussion: View Thread

  • 1.  Comparing logistic and neural Models

    Posted 03-14-2012 20:14
    This message has been cross posted to the following eGroups: Statistical Computing Section and Statistical Consulting Section .
    -------------------------------------------

    Dear all,

     
    A friend of mine has predicted
    a binary event (low back pain) with two models (logistic regression and artificial neural network). She calculated the area under the ROC curves for comparing the predictive ability of these two models and the area under the ROC curves are approximately the same. But the STATA output shows a significant difference between these two curves. She has sent her article to a journal and because the areas are the same and the p-value is significant the reviewer thought the calculation is wrong.

    . roccomp LBP Logit Neural

                                  ROC                    -Asymptotic Normal--

                       Obs       Area     Std. Err.      [95% Conf. Interval]

    -------------------------------------------------------------------------

    Logit              16233     0.7523       0.0045        0.74353     0.76108

    Neural             16233     0.7536       0.0045        0.74489     0.76239

    -------------------------------------------------------------------------

    Ho: area(T_L) = area(T_N)    chi2(1) = 8.54       Prob>chi2 = 0.0035

     

    We think as the confidence interval for the areas overlap in a large portion, the reviewer thought the calculation is wrong. The covariance matrix is as follows:

    var(area_Logit)= .00002004        var(area_Neural)= .00001992        cov(area_Logit, area_)=.00001988

     She wants to answer the reviewer objection in this way:

    "As we can see from the output, the two confidence intervals overlap in a large portion and it seems the null hypothesis, (the equality of two ROC curve) should not be rejected. However, because these two models applied to the same data set they are correlated (corr=.99). We should consider this correlation in the hypothesis testing.  As a result, the statistical test for equality of these two areas is as follows:

    X^2= (0.7523-0.7536)^2 / (0.00002004+0.00001992-2*0.00001988)=8.45

    and we believe the significant result is not wrong."

    She has searched about the way in STATA12 to compare two ROC curve but she couldn't find a desirable result.

    If STATA uses the same way as she did, please let us know. If not, please introduce us a good reference to know more about the comparing algorithm.

    Also please let us know if her way to calculate the chi-squared statistic for this kind of comparing is true or not?

    Your contribution and discussion will be appreciated.

    Bunch of thanks,

    Amir



    -------------------------------------------
    Amir Kasaeian
    PhD Student in Biostatistics
    Tehran University of Medical Sciences (TUMS)
    amir_kasaeian@yahoo.com
    akasaeian@razi.tums.ac.ir
    -------------------------------------------


  • 2.  RE:Comparing logistic and neural Models

    Posted 03-16-2012 22:33
    On the one hand, I get the same chi-square as your friend when I use the covariance matrix you provided.  On the other hand, I think that the only reason your friend found a statistically significant difference is because she had a massively large sample size of 16,233 data points.  Think of it this way: if your friend chooses 1/10 of her 16,233 data points at random, and re-runs her procedures on this random subset, then the chi-square for the methods comparison will have an expected value equal to 1/10 of 8.45.  In other words, an expected chi-square of only 0.845 on a data set with more than 1600 data points in it...which, I'm sure you'll agree, does not sound very significant.       

    I think what's going on with your friend versus the reviewer is, there are two kinds of significance: statistical significance and practical significance, and they are very different animals.  The first has to do with "can it happen by chance", whereas the second has to do with "is it useful or interesting or noteworthy".  And your friend is focussed on statistical significance, whereas the reviewer is focussed on practical significance, although perhaps the reviewer does not articulate this very well.  

    If we use practical significance instead of statistical significance, then I will suggest that there is no practically significant difference between the two ROC areas under the curves (AUCs).  The two ROC curves have an average AUC of 0.75295 and a difference between AUCs of 0.0013, for a proportional difference of 0.0017 using the average AUC as the denominator in the proportion calculation. That's 0.17 percentage points or 1.7 parts per thousand.  If the goal is to compare two methods for performance, and to compare them using ROC AUC as the metric of performance, then it is hard to read practical significance into a performance difference of only 1.7 parts per thousand. 

    I would like to suggest a difference approach: plot the two ROC curves on top of each other.  Do they basically lie on top of each other?  Or do they deviate from each other noticeably? 

    Also, I have to ask about the 16,233 data points:  Are they all single data points collected from 16,233 different people?  Or did each person in the study contribute multiple data points taken over time?


    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------