I mostly use it in ROC diagnostic applications as well. Even then I recommend folks look at the whole ROC as well and not just the AUC, and have discussions with substantive collaborators on whether Sensitivity and Specificity are equally important or whether one is more valued than the other (such as in screening populations). I commonly give the same caution in areas such as the repeated pain score application that Alexandra brings up (nice discussion by Scott below). Looking a the full profile data through Exploratory Data Analysis techniques (Tukey, etc.) informs appropriate analyses.
Consider 3 individuals on different treatments measured 5 times on the 5 point likert pain scale with trajectories
Patient 1 (trt A): 5, 4, 3, 2, 1 (decreasing pain)
Patient 2 (trt B): 3, 3, 3, 3, 3 (constant pain)
Patient 3 (trt C): 1, 2, 3, 4, 5 (increasing pain)
All AUCs are the same, if trajectories like these are common within each treatment, then blindly calculating the AUCs and running tests to examine significance would likely lead to no perceived differences across treatments, even though the differences are clearly there. In this case, mixed models examining trends would likely be preferred to AUCs (even if the apriori specified analysis plans specified AUC approaches). If instead the trajectories were more constant and additive (as Scott mentions)
Patient 1 (trt A): 1, 1, 1, 1, 1 (lower pain)
Patient 2 (trt B): 3, 3, 3, 3, 3 (middle pain)
Patient 3 (trt C): 5, 5, 5, 5, 5 (higher pain) Then I'd expect the mixed model and AUC overall analyses to give comparable testing results (if missing data were not an issue), but would potentially prefer the mixed model interpretations. Given that pain scores generally have high degrees of informative missingness, I'd likely go with a mixed model (MAR) approach with some patter-mixture or alternative informative missingness sensitivity approaches.
Regardless, I'd always start with the EDA and show my audience as many visual results as possible to support my analytic framework.
-------------------------------------------
Michael Griswold
Executive Director
Univ MS Medical Center Biostatistics
-------------------------------------------
Original Message:
Sent: 10-06-2011 15:34
From: Michael Chernick
Subject: AUC vs Mixed Modeling
Regarding AUC, I like its use in evaluting operating characteristic curves such as in the case of the sensitivity and specificity of a diagnostic tool. There the AUC can be compared to chance as a chance OC curve would be a 45 degree line with AUC=0.50 whereas a good diagnostic tool would have an AUC much greater than 0.50.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 10-06-2011 14:50
From: Scott Berry
Subject: AUC vs Mixed Modeling
Alexandra, Sometimes semantics gets in the way of this. Of course, depends on what AUC and mixed models are being used. We have used "AUC" like measures in cases like this (integrated scores). For example, suppose you had pain readings at times t=4,6,8, and 10. There may not be a clear horizon that is the most important--or more often the team doesnt know where it is the most effective. You could construct an AUC like measure, that is the average score at each time period, AUC = Y4 + Y6 + Y8 + Y10 (can divide by 4 or not). Then compare the mean of this AUC variable using a means test. This quantity has a pretty clear meaning--correlations go away and get added over to inflate the variance of the AUC. The change from one treatment to another has a pretty clear meaning. If two time periods are better for a treatment and two are the same -- you can still have a difference between the groups. You wouldnt understand the granularity of the effect from this analysis -- but it can "work". It avoids a priori selecting one time period -- say t6 -- and missing with no effect there. If there is a consistent effect over time it can become stronger with this type of model. Many measures are already AUC liek measures -- averaging pain over 24 hours or a 7 day period. The visits you have are probably 7-day (14?) diaries.
You could also use a model that has subject and each visit as a factor. There may be additional correlation after the subject specific effects or not. This is a better representation of the error/correlation at each visit, especially if you have any prediction or confidence intervals to construct. Usually this model will have a "constant effect" -- additive -- of the treatment. This is not "too dissimilar" from the AUC model. Once you start allowing interactions -- or treatment effects that can vary at each visit things get very hard--and VERY difficult to interpret--from a testing viewpoint. You could conclude treatment 1 is better at t=4,8 and treatment 2 is better at t=6,10. Of course you couldn't even do much testing in the AUC.
So, do I have a point? I think most statisticians would prefer the latter of these in most cases, but I suspect you wont get much difference in results. Missing data is important here for sure. How you construct the AUC with missing data. What you do in the model approach is probably easier, and the latter provides nice alternatives (imputation) to LOCF. It would be neat to see some simulations or work to see which is more powerful under different assumptions of effects at the time periods.
We have used both models in similar cases.
-------------------------------------------
Scott Berry
Berry Consultants
-------------------------------------------
Original Message:
Sent: 10-06-2011 14:23
From: Alexandra Hanlon
Subject: AUC vs Mixed Modeling
Thanks for the discussion. The specific scenario of interest here is to determine whether an intervention is effective in reducing pain. Pain is measured repeatedly over treatment using one item having a five point Likert scale. The proposed primary analysis relies on a two-sample t-test comparing mean AUC pain score.
-------------------------------------------
Alexandra L. Hanlon
Associate Research Professor
University of Pennsylvania
-------------------------------------------
Original Message:
Sent: 10-06-2011 13:32
From: Michael Chernick
Subject: AUC vs Mixed Modeling
I appreciate Colleen's point about AUC. We should not reject it unequivocally. In my message I pointed out to applications wher comparing two AUCs is important. My position was based on the assumption that the data was appropriate for mixed effects linear models or else Alexandria would not be comparing the two approaches. In such cases especially if AUC is not the primary endpoint (or quantity of interest) the mixed model is probably better for handling missing data. In calculating AUC for cases with missing time points for the concentrations (linear interpolation or some other smoothing/interpolation method)? If several time points are missing I think this could be a very crude and possibly inaccurate way to estimate AUC. Of course any missing data method will have problems when there is a lot of missing data.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 10-06-2011 13:21
From: Colleen Kelly
Subject: AUC vs Mixed Modeling
I'm not saying that AUC can be applied to every problem. However, I think there would be examples of patient reported outcomes that, similar to the situation in PK, might be best interpreted with an AUC. For example, if a treatment aims to change a behavior like eating or exercise, there could be a lot of variation and no clear trend in how the treatment affects the behavior. In this case, you might be more interested in the total exposure to calories or exercise during the treatment period (which I think would correlate with weight loss) rather than the trend.
-------------------------------------------
Colleen Kelly
Principal Consultant
Kelly Statistical Consulting
-------------------------------------------
Original Message:
Sent: 10-06-2011 13:00
From: Christopher Barker
Subject: AUC vs Mixed Modeling
How do you explain or interpret, an AUC of quality of life? Its a patient reported outcome, so to speak, how the patient feels. An example, what would the AUC of the SF-36 domain, Mental Health, mean?
-------------------------------------------
Chris Barker, Ph.D.
President - San Francisco Bay Area Chapter of the American Statistical Association
www,barkerstats.com
-------------------------------------------
Original Message:
Sent: 10-06-2011 12:35
From: Colleen Kelly
Subject: AUC vs Mixed Modeling
I'm afraid I don't understand the arguments against using AUC and feel that a blanket statement like "mixed models are better than AUC" is unwarranted. A mixed model will not be more powerful if it is not appropriate to model the data at hand. AUC is one of the most commonly used measures of exposure in pharmacokinetics and has an important interpretation. It can be estimated after fitting a mixed model or in a non-compartmental model (it is just the area under the curve after all) and both methods can handle missing data. For pharmacokinetic data, modeling concentrations over time can be complex and the non-parametric model (the non-compartmental model) may be the best model. I think the answer to what is the best model depends on the data and the question of interest: are you more interested in trends over time or exposure?
-------------------------------------------
Colleen Kelly
Principal Consultant
Kelly Statistical Consulting
-------------------------------------------
Original Message:
Sent: 10-06-2011 09:42
From: Michael Chernick
Subject: AUC vs Mixed Modeling
I am familiar with many of the recommended references. They are all good recommendations. Places wher use of AUC show up and is appropriate are bioequivalence testing of two treatments and comparing Operating Characteristic curves. For longitudinal data, mixed linear models with repeated measures is usually the choice.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 10-06-2011 09:21
From: Michael Griswold
Subject: AUC vs Mixed Modeling
Completely agree with Michael & Alex, nice work & keep going with these recommendations. There are of course a ton of books out there on the subject, but LDA references I usually go with are Diggle, Heagerty, Liang & Zeger (2002) and the Verbeke & Molenberghs duo on linear & discrete longitudinal models (2001 & 2005). For this particular question though (paired ttest vs mixed model), Robert Weiss has a nice chapter on "Critques of Simple Analyses" in his book: Modeling Longitudinal Data (2005).
-------------------------------------------
Michael Griswold
Executive Director
Univ MS Medical Biostatistics
-------------------------------------------
Original Message:
Sent: 10-06-2011 08:14
From: Michael Chernick
Subject: AUC vs Mixed Modeling
I agree with you about the mixed model. I think the mixed model is the best approach and it does handle missing data as long as the missing at random assumption is valid. If you have non-ignorable missingness I think any method that does not model the mechanism for missingness will have problems. So I don't understand why the AUC approach would be appropriate.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 10-06-2011 07:49
From: Alexandra Hanlon
Subject: AUC vs Mixed Modeling
Hi Everyone,
I too am appreciative of your thoughful exchange of views. I have a question for you to consider -- in critquing statistical methodology supporting the comparison of longitudinal patient reported outcome by intervention, I frequently encounter the use of the AUC as a summary measure, with group comparisons based on a two-sample t-test. I consistently recommend the use of mixed modeling to take into account full information and increased power. I understand that missing data in this scenario is frequent and should be considered in the choice of appropriate methodology. Can you comment on various situations when you might recommend one approach over the other? Thanks so much.
-------------------------------------------
Alexandra L. Hanlon
Associate Research Professor
University of Pennsylvania
-------------------------------------------