ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

Multinomial Logistic Model Question

Georgette Asherman06-01-2015 08:32

Aubrey Magoun06-01-2015 12:21

Adam, have you tried looking at cluster analysis? ------------------------------ Aubrey Magoun Consultant ...

Adam Kovarik06-03-2015 09:41

Good morning everyone! Looks like if i'm going to continue down the logistic path i should weight ...

1. Multinomial Logistic Model Question

Recommend

Adam Kovarik

Posted 05-27-2015 11:39

First time poster, hope I’m doing this right…

I put together a multinomial logisitic regression model to identify influential variables in separating across 3 classes. After several adjustments I landed on a significant model with 10 independent variables (6 categorical and 4 continuous).

However, when I scored the test set and produced a classification matrix, the results are subpar. The model is strongly weighted towards classifying all original classes into a the third class. Does anyone have any idea on what steps I should be taking to avoid this issue? Is this simply the result of a weak model?

SAS Output:

3.33

142

31.49

294

65.19

451

1-2

1.57

340

33.27

666

65.17

1022

3 +

0.14

253

17.61

1182

82.25

1437

735

2142

2910

------------------------------
Adam
------------------------------

2. RE: Multinomial Logistic Model Question

Recommend

Walter Davis

Posted 05-28-2015 03:32

It's been a long time since I fit a multinomial logistic for real but ... I assume it comes down to the criterion the program (default?) uses to assign an observation to a predicted group membership.

That is, the baseline distribution of categories A, B, C in your data is roughly 15%, 35%, 50%. For any given observation, the model will produce a predicted probability that it belongs to each category. For example, an observation might come out with predicted probabilities of A,B,C of .17, .37, .46. We might look at this as an observation less likely to be a C than the typical observation but I'm guessing the default in the program is to assign it to the most likely category -- in this case C.

You'll note you get almost none predicted for the first category. If my guess is correct, that's because an observation won't be assigned to category A unless p(A) is at least .34 ... and maybe higher since it still needs to be higher than p(B) ... i.e. even if p(A) was .34 it might well be that p(B) is .4 and it gets predicted to be a B.

Put yourself in the computer's shoes. Fisher himself might come to you from beyond the grave and tell you that this person's true probabilities are .17, .37 and .46 -- there's no model error, that's the truth. What category do you assign that person to? What do you do when Fisher tells you the next person's probabilities are also .17, .37 and .46? Such is the nature of probability -- even if I know the true probabilities, I'm going to be wrong a lot.

In short, that table is probably not a particularly useful diagnostic for model fit -- at least if it's constructed the way I think. I've never been a fan of assigning a predicted category based on probabilities -- just find some way to capture the predicted probability distribution for different types/groups of observations. I don't know if there's a multinomial equivalent of the Hosmer-Lemeshow test but something like that probably gives a better feel for how well the model is doing.

As a simple check, look at the distributions of the predicted p(A), p(B) and p(C). Are you getting a "good" spread of p(A)s across observations? If so, your model is probably doing a good job. On the other hand, if most of the observations have predicted p(A), p(B), p(C) around .15, .35 and .5 then the model is not doing much to distinguish among observations.

------------------------------
Walter Davis
Senior Research Fellow
National Institute for Applied Statistics Research Australia
------------------------------

Original Message

3. RE: Multinomial Logistic Model Question

Recommend

Adam Kovarik

Posted 05-28-2015 15:15

Walter –

First of all, thank you for the response. Yes, the model is classifying observations by the largest value of p(class). I understand what you are pointing out. In applying this model for predictive purposes; what methodology would I use if not choosing the class with the largest probability? Would I choose the class with the largest delta from the prior probabilities?

I am not very well versed in multinomial logistic, but I thought I would attempt in this scenario. Maybe it makes more sense to use a decision tree approach?

------------------------------
Adam
------------------------------

Original Message

Original Message:
Sent: 05-28-2015 03:31
From: Walter Davis
Subject: Multinomial Logistic Model Question

It's been a long time since I fit a multinomial logistic for real but ... I assume it comes down to the criterion the program (default?) uses to assign an observation to a predicted group membership.

------------------------------
Walter Davis
Senior Research Fellow
National Institute for Applied Statistics Research Australia
------------------------------

Original Message:
Sent: 05-27-2015 11:38
From: Adam Kovarik
Subject: Multinomial Logistic Model Question

First time poster, hope I’m doing this right…

SAS Output:

3.33

142

31.49

294

65.19

451

1-2

1.57

340

33.27

666

65.17

1022

3 +

0.14

253

17.61

1182

82.25

1437

735

2142

2910

------------------------------
Adam
------------------------------

4. RE: Multinomial Logistic Model Question

Recommend

Dallas Johnson

Posted 05-29-2015 10:32

Are you using a PRIORS option? What WAS proc are you using?

------------------------------
Dallas Johnson
Professor of Statistics
Kansas State University
------------------------------

Original Message

Original Message:
Sent: 05-28-2015 15:14
From: Adam Kovarik
Subject: Multinomial Logistic Model Question

Walter –

I am not very well versed in multinomial logistic, but I thought I would attempt in this scenario. Maybe it makes more sense to use a decision tree approach?

------------------------------
Adam
------------------------------

Original Message:
Sent: 05-28-2015 03:31
From: Walter Davis
Subject: Multinomial Logistic Model Question

It's been a long time since I fit a multinomial logistic for real but ... I assume it comes down to the criterion the program (default?) uses to assign an observation to a predicted group membership.

First time poster, hope I’m doing this right…

SAS Output:

3.33

142

31.49

294

65.19

451

1-2

1.57

340

33.27

666

65.17

1022

3 +

0.14

253

17.61

1182

82.25

1437

735

2142

2910

------------------------------
Adam
------------------------------

5. RE: Multinomial Logistic Model Question

Recommend

Edward Gracely

Posted 05-28-2015 08:50

Hi, Adam.

Can you get a table showing classification probabilities for individuals? I'd guess the problem is that the model, even if not "weak", isn't strong enough to give very many a relatively high probability of being in category 0, since there are relatively few of them.

Ed

------------------------------
Edward Gracely
Drexel University
------------------------------

Original Message

6. RE: Multinomial Logistic Model Question

Recommend

Adam Kovarik

Posted 05-28-2015 15:15

Edward, thank you for chipping in. The prior probability of “0” is only 15.5%. The simple statistics on p(0) show me that there is some spread (range-[.007,.448], -0.154,s-.074); however, it is definitely centered heavily around 0.155. To my earlier point with Walter, perhaps I need to view this classification model in a different fashion?

------------------------------
Adam
------------------------------

Original Message

7. RE: Multinomial Logistic Model Question

Recommend

Kent Johnson

Posted 05-29-2015 09:25

You don't say how you arrived at your 10 predictors. Were you creating them by feature engineering or selecting them from a larger set?

I have had good success using cv.glmnet from the R package glmnet to do lasso-regularized multinomial regression (alpha=1) with small data sets and many potential predictors.

You might try creating a balanced training set by sampling from the more frequent classes or duplicating training cases from the less frequent classes.

------------------------------
Kent Johnson
------------------------------

Original Message

8. RE: Multinomial Logistic Model Question

Recommend

Georgette Asherman

Posted 06-01-2015 08:32

I had a similar situation a few months ago. I had a set of continuous predictors and an ordinal clinical grade. My estimates for the predictors was good but the classification rule put most of the hold-out cases in the most frequent group. I tried proportional odds and multinomial models. Weighting the training data is valuable but might not yield better classification. Since estimation of predictors was not my major concern I tried other techniques like tress and nets. Eventually we settled on naive bayes but I still had to weight my sampling set.

------------------------------
Georgette Asherman
------------------------------

Original Message

9. RE: Multinomial Logistic Model Question

Recommend

Aubrey Magoun

Posted 06-01-2015 12:21

Adam, have you tried looking at cluster analysis?

------------------------------
Aubrey Magoun
Consultant
Applied Research & Analysis, Inc.
------------------------------

Original Message

10. RE: Multinomial Logistic Model Question

Recommend

Adam Kovarik

Posted 06-03-2015 09:41

Good morning everyone!

Looks like if i'm going to continue down the logistic path i should weight my sample. Theoretically speaking, how does weighting my sample for approx. equal priors influence the classification on the test sample?

Aubrey, i haven't specifically thought of using cluster analysis. Are you suggesting I should aim to reduce my explanatory variables? I didn't think having 10 predictors given that i have over 6k records in my training set was going to cause any issues.

Thank you everyone for submitting ideas. This is a tremendously helpful exercise for me.

------------------------------
Adam
------------------------------

Original Message

ASA Connect

Multinomial Logistic Model Question

Adam Kovarik05-27-2015 11:39

Walter Davis05-28-2015 03:32

Adam Kovarik05-28-2015 15:15

Dallas Johnson05-29-2015 10:32

Edward Gracely05-28-2015 08:50

Adam Kovarik05-28-2015 15:15

Kent Johnson05-29-2015 09:25

Georgette Asherman06-01-2015 08:32

Aubrey Magoun06-01-2015 12:21

Adam Kovarik06-03-2015 09:41

1. Multinomial Logistic Model Question

2. RE: Multinomial Logistic Model Question

3. RE: Multinomial Logistic Model Question

4. RE: Multinomial Logistic Model Question

5. RE: Multinomial Logistic Model Question

6. RE: Multinomial Logistic Model Question

7. RE: Multinomial Logistic Model Question

8. RE: Multinomial Logistic Model Question

9. RE: Multinomial Logistic Model Question

10. RE: Multinomial Logistic Model Question

Contact Us

Membership

Privacy

Follow Us