Original Message:
Sent: 05-28-2015 15:14
From: Adam Kovarik
Subject: Multinomial Logistic Model Question
Walter –
First of all, thank you for the response. Yes, the model is classifying observations by the largest value of p(class). I understand what you are pointing out. In applying this model for predictive purposes; what methodology would I use if not choosing the class with the largest probability? Would I choose the class with the largest delta from the prior probabilities?
I am not very well versed in multinomial logistic, but I thought I would attempt in this scenario. Maybe it makes more sense to use a decision tree approach?
------------------------------
Adam
------------------------------
Original Message:
Sent: 05-28-2015 03:31
From: Walter Davis
Subject: Multinomial Logistic Model Question
It's been a long time since I fit a multinomial logistic for real but ... I assume it comes down to the criterion the program (default?) uses to assign an observation to a predicted group membership.
That is, the baseline distribution of categories A, B, C in your data is roughly 15%, 35%, 50%. For any given observation, the model will produce a predicted probability that it belongs to each category. For example, an observation might come out with predicted probabilities of A,B,C of .17, .37, .46. We might look at this as an observation less likely to be a C than the typical observation but I'm guessing the default in the program is to assign it to the most likely category -- in this case C.
You'll note you get almost none predicted for the first category. If my guess is correct, that's because an observation won't be assigned to category A unless p(A) is at least .34 ... and maybe higher since it still needs to be higher than p(B) ... i.e. even if p(A) was .34 it might well be that p(B) is .4 and it gets predicted to be a B.
Put yourself in the computer's shoes. Fisher himself might come to you from beyond the grave and tell you that this person's true probabilities are .17, .37 and .46 -- there's no model error, that's the truth. What category do you assign that person to? What do you do when Fisher tells you the next person's probabilities are also .17, .37 and .46? Such is the nature of probability -- even if I know the true probabilities, I'm going to be wrong a lot.
In short, that table is probably not a particularly useful diagnostic for model fit -- at least if it's constructed the way I think. I've never been a fan of assigning a predicted category based on probabilities -- just find some way to capture the predicted probability distribution for different types/groups of observations. I don't know if there's a multinomial equivalent of the Hosmer-Lemeshow test but something like that probably gives a better feel for how well the model is doing.
As a simple check, look at the distributions of the predicted p(A), p(B) and p(C). Are you getting a "good" spread of p(A)s across observations? If so, your model is probably doing a good job. On the other hand, if most of the observations have predicted p(A), p(B), p(C) around .15, .35 and .5 then the model is not doing much to distinguish among observations.
------------------------------
Walter Davis
Senior Research Fellow
National Institute for Applied Statistics Research Australia
------------------------------
Original Message:
Sent: 05-27-2015 11:38
From: Adam Kovarik
Subject: Multinomial Logistic Model Question
First time poster, hope I’m doing this right…
I put together a multinomial logisitic regression model to identify influential variables in separating across 3 classes. After several adjustments I landed on a significant model with 10 independent variables (6 categorical and 4 continuous).
However, when I scored the test set and produced a classification matrix, the results are subpar. The model is strongly weighted towards classifying all original classes into a the third class. Does anyone have any idea on what steps I should be taking to avoid this issue? Is this simply the result of a weak model?
SAS Output:
------------------------------
Adam
------------------------------