Even AUROC sometimes isn't practical enough.... it still can be difficult for making a business case, whose expectations are often measured in dollars and can come with additional constraints that do not necessarily impact model estimation but do impact the implementation of the decisions based on the model.
There was a case in which the AUROC of one model was in the upper 80s and the other upper 70s (different designs and approaches). The model with AUROC in the upper 70s was clearly the winner because of how it performed around the constraints. In effect, the model with AUROC in the upper 80s did better (and quite a bit so) where it didn't matter in practice--outside of the so-called "operating range". The objective function here was dollars and there were constraints in the available workload and the associated costs to implement the decision. This was obviously not a "statistical" factor per se, but one that had an impact on a statistical decision (i.e., the selection of the final model). Sort of a poor man's constrained optimization, if you will.
P.S. I have come across scenarios in which %accuracy in linear regression actually did translate to dollars and other indirect regulatory implications; I have actually seen models rejected because of lack of accuracy despite reasonable MSEs and such. So my question to the OP would be why the client is asking for it and what the intended use of the model is.
Original Message:
Sent: 06-28-2023 16:25
From: Glen Colopy
Subject: Accuracy for regression models
To expiate my sin of self-reference, here is another example of medical research that's grappling with the fact that the model is optimized for an *analytically-tractable* metric that's less important than an alternative metric (that's less analytically tractable).
In this case it's (i) a logistic regression model with normal MLE training, versus (ii) the more useful metric is AUROC.
Here the article link: Developing biomarker combinations in multicenter studies via direct maximization and penalization - PubMed
Related work by Allison Meisner and others:
- https://pubmed.ncbi.nlm.nih.gov/29344362/
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499057/
------------------------------
Glen Wright Colopy
DPhil Oxon
Host | The Data & Science Podcast
Head of Data Science | Alesca Life Tech Ltd
Original Message:
Sent: 06-28-2023 16:13
From: Michiko Wolcott
Subject: Accuracy for regression models
As a quick second to Glen's quick second ;-) I feel--heck, I KNOW this is not talked about enough in the applied world. I would personally like to see more formal coverage on the topic in applied stats courses--doesn't have to be extensive, just enough to make one think. There are many different ways to measure the "success" of a model, or rather, how you measure the "success" of the model depends on the practical objective of the model. In the end, what ultimately matters is how the decisions based on the model impact things that are real. Some of those metrics may not be "statistical" at all, so our responsibility becomes ensuring the model itself is "statistically sound" while maximizing whatever the objective function of interest....
------------------------------
Michiko Wolcott
Principal Consultant
Msight Analytics
Original Message:
Sent: 06-28-2023 15:42
From: Glen Colopy
Subject: Accuracy for regression models
As a quick second on Andrew's comment with a very similar example to his (& apologies in advance for the self-citation), here's an example where....
- a Gaussian process regression model* is constantly being re-fit to new data using maximum a posteriori (MAP)
- but the "ultimate performance" of the model isn't the data with respect to the posterior predicted distribution....it's the lower quantiles of this performance.
- In other words, for our clinical situation, we don't care if the model is very accurate & precise most of the time (on the easy patients), we want to reduce the severity of the very worst predictions.
- So we developed an algo to identify models that optimize this analytical metric (in real time).
Link to Downloadable PDF: https://ieeexplore.ieee.org/document/8226743
Like the value of Andrew's follow-up analysis. Frequently the important part of a model is what it does when it get's things wrong, not how well it gets most thing right. That why it's important to not conflate "model performance metric" with "customer/user/business value".
* If you're not familiar with GP regression, just think of it as linear regression over the kernel space...plop a kernels on each of the data points & regress over that so you can estimate non-linear functions without (m)any icky assumptions.
------------------------------
Glen Wright Colopy
DPhil Oxon
Host | The Data & Science Podcast
Head of Data Science | Alesca Life Tech Ltd
Original Message:
Sent: 06-28-2023 14:57
From: Andrew Ekstrom
Subject: Accuracy for regression models
How much data do you have?
A previous employer demanded to know the accuracy of a regression model. I told them the R^2 value is 0.99978, or whatever it was. That was what they wanted.
Something I did out of curiosity was to create model with training data. Then look at how many of the testing data points for Y fit within +/- 1 SD, 2SD, etc. or within some relevant parameters say Y_hat +/- 5%, 10% etc.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
Original Message:
Sent: 06-28-2023 13:01
From: Chris Comora
Subject: Accuracy for regression models
Hello everyone,
I am doing some modeling for a client (linear regression) and they are insistent on being able to evaluate the model based on accuracy. I've tried talking to them about the need to measure the model's performance based on RMSE, MSE, etc but they want to see its 'accuracy' expressed as a percentage. Attempts to explain the difference between how we evaluate classification and regression models haven't been successful, yet. Any advice from those who might have dealt with a similar issue?
------------------------------
Chris Comora
North Carolina State University
------------------------------