I mostly use it in ROC diagnostic applications as well. Even then I recommend folks look at the whole ROC as well and not just the AUC, and have discussions with substantive collaborators on whether Sensitivity and Specificity are equally important or whether one is more valued than the other (such as in screening populations). I commonly give the same caution in areas such as the repeated pain score application that Alexandra brings up (nice discussion by Scott below). Looking a the full profile data through Exploratory Data Analysis techniques (Tukey, etc.) informs appropriate analyses.
Consider 3 individuals on different treatments measured 5 times on the 5 point likert pain scale with trajectories
Patient 1 (trt A): 5, 4, 3, 2, 1 (decreasing pain)
Patient 2 (trt B): 3, 3, 3, 3, 3 (constant pain)
Patient 3 (trt C): 1, 2, 3, 4, 5 (increasing pain)
All AUCs are the same, if trajectories like these are common within each treatment, then blindly calculating the AUCs and running tests to examine significance would likely lead to no perceived differences across treatments, even though the differences are clearly there. In this case, mixed models examining trends would likely be preferred to AUCs (even if the apriori specified analysis plans specified AUC approaches). If instead the trajectories were more constant and additive (as Scott mentions)
Patient 1 (trt A): 1, 1, 1, 1, 1 (lower pain)
Patient 2 (trt B): 3, 3, 3, 3, 3 (middle pain)
Patient 3 (trt C): 5, 5, 5, 5, 5 (higher pain) Then I'd expect the mixed model and AUC overall analyses to give comparable testing results (if missing data were not an issue), but would potentially prefer the mixed model interpretations. Given that pain scores generally have high degrees of informative missingness, I'd likely go with a mixed model (MAR) approach with some patter-mixture or alternative informative missingness sensitivity approaches.
Regardless, I'd always start with the EDA and show my audience as many visual results as possible to support my analytic framework.
-------------------------------------------
Michael Griswold
Executive Director
Univ MS Medical Center Biostatistics
-------------------------------------------
Original Message:
Sent: 10-06-2011 15:34
From: Michael Chernick
Subject: AUC vs Mixed Modeling
Regarding AUC, I like its use in evaluting operating characteristic curves such as in the case of the sensitivity and specificity of a diagnostic tool. There the AUC can be compared to chance as a chance OC curve would be a 45 degree line with AUC=0.50 whereas a good diagnostic tool would have an AUC much greater than 0.50.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 10-06-2011 14:50
From: Scott Berry
Subject: AUC vs Mixed Modeling
Alexandra, Sometimes semantics gets in the way of this. Of course, depends on what AUC and mixed models are being used. We have used "AUC" like measures in cases like this (integrated scores). For example, suppose you had pain readings at times t=4,6,8, and 10. There may not be a clear horizon that is the most important--or more often the team doesnt know where it is the most effective. You could construct an AUC like measure, that is the average score at each time period, AUC = Y4 + Y6 + Y8 + Y10 (can divide by 4 or not). Then compare the mean of this AUC variable using a means test. This quantity has a pretty clear meaning--correlations go away and get added over to inflate the variance of the AUC. The change from one treatment to another has a pretty clear meaning. If two time periods are better for a treatment and two are the same -- you can still have a difference between the groups. You wouldnt understand the granularity of the effect from this analysis -- but it can "work". It avoids a priori selecting one time period -- say t6 -- and missing with no effect there. If there is a consistent effect over time it can become stronger with this type of model. Many measures are already AUC liek measures -- averaging pain over 24 hours or a 7 day period. The visits you have are probably 7-day (14?) diaries.
You could also use a model that has subject and each visit as a factor. There may be additional correlation after the subject specific effects or not. This is a better representation of the error/correlation at each visit, especially if you have any prediction or confidence intervals to construct. Usually this model will have a "constant effect" -- additive -- of the treatment. This is not "too dissimilar" from the AUC model. Once you start allowing interactions -- or treatment effects that can vary at each visit things get very hard--and VERY difficult to interpret--from a testing viewpoint. You could conclude treatment 1 is better at t=4,8 and treatment 2 is better at t=6,10. Of course you couldn't even do much testing in the AUC.
So, do I have a point? I think most statisticians would prefer the latter of these in most cases, but I suspect you wont get much difference in results. Missing data is important here for sure. How you construct the AUC with missing data. What you do in the model approach is probably easier, and the latter provides nice alternatives (imputation) to LOCF. It would be neat to see some simulations or work to see which is more powerful under different assumptions of effects at the time periods.
We have used both models in similar cases.
-------------------------------------------
Scott Berry
Berry Consultants
-------------------------------------------
Original Message:
Sent: 10-06-2011 14:23
From: Alexandra Hanlon
Subject: AUC vs Mixed Modeling
Thanks for the discussion. The specific scenario of interest here is to determine whether an intervention is effective in reducing pain. Pain is measured repeatedly over treatment using one item having a five point Likert scale. The proposed primary analysis relies on a two-sample t-test comparing mean AUC pain score.
-------------------------------------------
Alexandra L. Hanlon
Associate Research Professor
University of Pennsylvania
-------------------------------------------
Original Message:
Sent: 10-06-2011 13:32
From: Michael Chernick
Subject: AUC vs Mixed Modeling
I appreciate Colleen's point about AUC. We should not reject it unequivocally. In my message I pointed out to applications wher comparing two AUCs is important. My position was based on the assumption that the data was appropriate for mixed effects linear models or else Alexandria would not be comparing the two approaches. In such cases especially if AUC is not the primary endpoint (or quantity of interest) the mixed model is probably better for handling missing data. In calculating AUC for cases with missing time points for the concentrations (linear interpolation or some other smoothing/interpolation method)? If several time points are missing I think this could be a very crude and possibly inaccurate way to estimate AUC. Of course any missing data method will have problems when there is a lot of missing data.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 10-06-2011 13:21
From: Colleen Kelly
Subject: AUC vs Mixed Modeling
I'm not saying that AUC can be applied to every problem. However, I think there would be examples of patient reported outcomes that, similar to the situation in PK, might be best interpreted with an AUC. For example, if a treatment aims to change a behavior like eating or exercise, there could be a lot of variation and no clear trend in how the treatment affects the behavior. In this case, you might be more interested in the total exposure to calories or exercise during the treatment period (which I think would correlate with weight loss) rather than the trend.
-------------------------------------------
Colleen Kelly
Principal Consultant
Kelly Statistical Consulting
-------------------------------------------