Discussion: View Thread

Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

  • 1.  Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 17:12

    Over the past few years, there has been an explosion in the number of statistical methodologies geared toward accurate prediction, often at the expense of interpretability of the model.  Methods like random forests, neural networks, and gradient boosted machines (not to mention ensemble methods that combine predictions across these different methods) often out-perform simple models that can be interpreted (e.g., regression models, trees, and factor analysis).

    This month, I’d like to discuss the trade-offs between predictive and explanatory models.  In my experience, clients often ask for a highly predictive model but later want to understand the inner workings of the “black box.”  In other cases, I’ve had clients ask for a simple explanatory model and then ask if prediction can be improved using more advanced methods, even if it’s less interpretable.

    Feel free to share anything relevant to the topic – experiences you’ve had with clients, strategies you use to decide on a method, additional questions you have in mind.  Here are a few questions to get things started:

    • How do you decide when a predictive model is the best choice?
    • How do you mitigate the risk that a client will change his/her mind about what model is needed to solve the business problem?
    • What methods do you think are most useful, in your experience, for each task? Why?
    • In different fields (e.g., banking, pharma, finance, government, general consulting) what type of method is most used? If you had to make a recommendation to a student going into these fields what methods they should focus on, what would you advise?

    I look forward to hearing your thoughts and ideas.

    -Chris

    ------------------------------
    Christopher Holloman
    Information Control Company
    ------------------------------


  • 2.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 17:33

    I'm going through this right now. I am working with survey data. The company wants to find ways of improving their customer satisfaction. between 95% and 99% of the customer data I looked at has a positive outcome. I had to tell them only looking at the bad outcomes does not mean much. You'll find many similarities among those groups. However, they might also be similar among the happy customers. 

    A typical Logistic regression does a really poor job for predicting bad outcomes. So do CART models and Random Forests. Neural Networks are not much better either. Basically, I am trying to model random noise. So, it's "good" for them... kind of.

    I've seen how bad Logistic regression can be for predicting and modeling outcomes with small probabilities of events. As the variability increases, Logistic regressions get worse. To help with modeling accuracy, I'm using a confusion matrix with my logistic regressions. I'm not very happy with the results.  Even "textbook data" tends to fail a Confusion Matrix analysis. That makes me suspicious of Logistic Regression models in general.

    CART models tend to be easy to understand and do well for modeling purposes. Random Forests help with overfitting and making more robust models. I don't like Neural Networks. To difficult to deal with.   

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)



  • 3.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 17:59

    You might try log linear models if your data is all categorical.  Log linear models are used to model contingency tables.  The models find the important interactions between catagorical variables.  I used the book, Discrete Multivariate Analysis, by Bishop and others, published by the MIT press, to learn the method.

    ------------------------------
    Margot Tollefson
    Consultant
    Vanward Statistics



  • 4.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 18:33

    Thank you so much Chris for starting this particular dialogue.  This is an area I have been working with and thinking about for some time and I plan to follow the discussion closely.

    While other people discuss the appropriateness and use of particular terminology and methods in their current work, I would like to focus a side discussion on ensemble modeling.

    Mostly, I want to ask the audience about so called 'Super Learning Models' which seem to take the idea of ensemble models to the extreme and use aspects of every model that one can posit.  I'm not an expert and if others understand this better then please educate me but the idea seems to be 'take the strengths from all other models'.  Researchers like van der Laan and others continue to show that super learning shows great promise but I have a philosophical issue with the use of this or other such ensemble methods.

    One underlying concept of statistical modeling is that the data come from some sort of 'true' process and the goal of a multivariable statistical model is to some how get estimate or elucidate the true impact of various predictor variables on some target response.  The founders of our discipline suggested that maybe only God or some higher being would ever know the true relationship between predictors and response and that we were all just making educated guesses. 

    So I start with the idea that there is some 'true' model for the associate between my set of predictors and my response of interest.  If I fit least squares regression models then I am assuming that the form of that 'true' model is maybe linear.  If I fit nonlinear models then maybe I assume that the association is non-linear but has some sort of functional form.  Tree based methods assume that the 'true' model has maybe a series of threshold style effects instead of some sort of function.  What I am trying to say is that there is a trade-off  in picking a model and what one assumes about the 'truth'.

    So how do ensemble models fit into that ideology?  I can reconcile model averaging as a statistician gambler who decides to hedge his bets by allowing a bunch of different assumptions have a voice in the discussion.  But Super Learning (as I understand it) takes the 'best' predictions from each model in it's purview.  So when linear models work best then use those and when threshold models work best then use those.  This is a concept I just can not reconcile in this framework.  I don't see the 'true' model as a moving target from which to just take piecemeal parts of different models being considered.  To me this model seems to be akin more to a chimera from mythology.

    So I guess my question is whether anyone else had considered using these methods and whether anyone else share or can ease my concerns?

    Jason

    ------------------------------
    Jason Brinkley
    American Institutes for Research



  • 5.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 08:42

    [The word "linear" in linear (least squares) regression refers to the form on the model with respect to the unknown (to be estimated) parameters - not the form of the model with respect to the predictor variables.  So one may use linear regression on some models that are in fact nonlinear in terms of the predictors.]

    In general, for a given set of data, more than one model may be developed, each acceptable for *prediction* purposes, but any attempt to *interpret* the models (explain the roles of the predictors) immediately runs into the problem that the different models will have different predictors in them (and functionality of the predictors)...and different parameter values.  Using models developed for prediction (from observed data) for "interpretation" can lead to unhappy surprises.

    ------------------------------
    Wayne Fischer
    Statistician
    University of Texas Medical Branch



  • 6.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 11:45

    The way I like to think about ensemble methods, along with model averaging, is that we are not trying to estimate the model itself when we are trying to develop a predictive model.  Rather, we are trying to generate some set of rules for predicting new values, and we care about how close any new observed values are to their predicted values.  In this setting, we don't need the actual true model, and trying to identify a single model that will work sets up a situation where you need to identify a model you believe is sufficiently representative of reality.  As long as I don't need to worry about understanding the underlying mechanism to my set of models, then ensemble methods and model averaging will have lower error rates than any approach that attempts to identify a single model because your prediction error for the single model includes the model selection error: MSE(prediction) = sigma^2_y + bias^2+ sigma^2_model.  Model averaging and ensemble methods work since the "average" model will reduce sigma^2_model.  While MSE(prediction) is reduced, we have not identified any true model, and we are not trying to identify the one true model.  It is therefore critical that if one uses this approach that he/she does not try to interpret any coefficients since we have not chosen a true model.  

    Robert

    ------------------------------
    Robert Podolsky
    Associate Professor
    Wayne State University



  • 7.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 11:54

    "we are not trying to estimate the model itself when we are trying to develop a predictive model.  Rather, we are trying to generate some set of rules for predicting new values, and we care about how close any new observed values are to their predicted values.  In this setting, we don't need the actual true model, and trying to identify a single model that will work sets up a situation where you need to identify a model you believe is sufficiently representative of reality.  "

    I have a modest disagreement with this. The point, IMHO, of identifying a model which is "true" or based on some theory is that, hopefully, we can avoid some components of the model which are sample-specific. Using a fit-test approach does some of that, true. But a model which is entirely a-theoretic has more likelihood of having sample-specific fitting issues. Feel free to disagree with me...

    ------------------------------
    Paul Thompson
    Director, Methodology and Data Analysis Center
    Sanford Research/USD



  • 8.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 12:16

    Sample-specific fitting is equivalent to overfitting, and this is something that one needs to be very careful with.  Ensemble methods and model averaging, if done correctly, do account for overfitting.  Also, one needs to remember that an ensemble/model averaging is not identifying a single model.  It is true that one still needs to worry about the "super model," however the super model will do well if it is assembled correctly. BTW, random forests work well to stabilize CART models, but they also do well in estimating prediction error because they in effect are an ensemble/model average approach.  

    ------------------------------
    Robert Podolsky
    Associate Professor
    Wayne State University



  • 9.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 12:50

    All great points.  I like Robert's perspective on model averaging.  But Super Learning does not average, it instead takes the best predictive components from each model.  I *think* the point is to rely on models who predict with more accuracy in certain parts of the sample space. So it really is a new extreme where prediction is king.  I think there is safe guards in place for sample specific and other over fitting.  I just don't know how I feel about swinging the pendulum so far in the other direction to say that prediction is the thing that I most care about.  I think of this as a different class of models than either a random forest or an ensemble model that uses some sort of averaging and I am not sure that I should.

    ------------------------------
    Jason Brinkley
    American Institutes for Research



  • 10.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 13:42

    Perhaps it will be fun to pour more fuel on the fire;-)

    Suppose you look at David Kleinbaum's "Evans County Data". The response for the data is whether or not someone has heart disease. The number of people that are positive for heart disease is about 10%.

    There are a few factors to test. Kleinbaum also uses a couple interaction terms in his model. He used a logistic regression model which is/was a standard method. If a doctor wanted to use his model to determine if someone had heart disease, the doctor would be wrong most of the time.

    I took this data set and split it 70% training and 30% validation. Of the 50 people in the training data set with heart disease, the logistic regression model predicts 11 of those people have heart disease. In the validation model, 3 of 21 people are correctly classified as having heart disease. That begs the question, "What exactly is the logistic regression explaining?" Since it does such a poor job of predicting the outcome, what good is it? It's obviously NOT explaining the effects of the factors on the outcome, heart disease. If it was, I would expect a much better predictive ability.

    Now,  if I take the same data set and break it up into the same groups and use a CART model,  it accurately predicts 20 of the 50 people in the training set and 7 of 21 in the validation set. This is over 100% better than the logistic regression.  

    With this data, Random Trees and Bagging did about as well as the logistic regression. A bit of a disappointment. 

    Boosting was 100% accurate with the training data and about the same with the Validation data. Clearly, this data was well over fit. 

    Now, we can ask why does the Cart model do a better job? If you look at a typical logistic regression textbook, or talk to a typical biostatistician, you find that they don't like to use interaction terms for a couple different reasons. 

    1) Our professors decried the need for "parsimony"  and we were warned about the dangers of over fitting the data

    2) We were warned about having high VIF among the terms in the model 

    The Cart model uses interactions and doesn't care about making models simple. For those that work with real data, it's very easy to see reality is NOT simple! If it was simple, we, as statisticians, would be useless. There would be no more data to analyze. There would only be data to fit into models. We'd be like an accountant. 

    However, if we center our data, (i.e. subtract the mean of a factor from every value in the factor) we can create loads of interaction terms and quadratic terms from those centered factors without high VIF for most of the terms.

    If we break our data into training and validation data sets and run logistic regression and other models on that type of data, we can see if we are over fitting our models or if we are underfitting our models. By doing that, we create an "explanatory" model that has predictive powers. 

    Perhaps using the ideas behind predictive models, we can improve the predictive abilities of explanatory models. Then, we would only need one method for everything, instead of multiple methods. 

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)



  • 11.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 14:09

    Very interesting discussion.  I was pondering lately is a model with interactions and higher level terms really less parsimonious than a model that is linear?  This is assuming enough data to estimate the functional form well.  It still uses the same number of variables, whether linear or a polynomial you should be wary of extrapolation and estimation on the boundaries.  I realize over fitting can be dangerous with too many terms and you need to be cautious but assuming enough data is present is a polynomial more "complex" than a line or is it just mathematically more terms.  Obviously if a line is the most correct, use the line, but if there is curvature and you model it to me that is not more complex, just a better model.  To me adding variables is a definite increase in complexity because we would need to know more information for prediction.

    Do you think the complexity is determined by the number of variables or the the number of terms from a philosophical sense?  I understand there is more parameters to estimate with the more complex models but in the presence of enough data to estimate them is that relevant?

    Thanks.

    Laura

    ------------------------------
    Laura Kapitula
    Assistant Professor
    Dept. of Statistics
    Grand Valley State University



  • 12.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 15:04

    Fascinating discussion!

    There's a paper by Barbieri and Berger (Annals of Statistics, 2004, "Optimal predictive model selection") that says that, in the context of Bayesian model selection when there are a large number of additive components prepared in advance, the best predictive model is the one that includes all terms that have posterior probability of inclusion > 0.5.  Similarly, Frank Harrell has mentioned in a talk (if I recall correctly) that if prediction is a goal, one should include all terms that have p-values > 0.5 (rather than 0.05 or 0.1).  This flies in the face of the standard paradigm many of us learned in graduate school, that we should remove non-significant terms (for some meaning of "non-significant") when selecting a model.  Frankly, it seems that if prediction is the goal, that fundamental paradigm is not supported by statistical theory or by anecdotal experience.  Maybe it's time to start teaching something else!

    I was also recently reading a paper by Leo Breiman, "Statistical Modeling:  The Two Cultures" (Statistical Science, 2001), and the discussion which follows, which largely recapitulates this entire discussion.

    Personally I strive to find the best-predictive model I can, and then if interpretation is the goal, see what parsimonious model comes close.  But I'm not willing to sacrifice too much predictive quality, because then I feel that I'm not reporting what the data is saying.  If I can't find a simple model that predicts pretty well, then I have to report to my clients or interested parties that the data is inherently complex.

    Sometimes we design experiments that are sparse in the design space (fractional factorials, etc.) that only admit fitting simple models; in these cases we are implicitly admitting at the outset that we are only going to find the heavy hitters in our processes.  When data is very expensive to collect, this is the best we can do, and it's still worth doing, but we have to recognize we might not be getting the best possible predictions.

    ------------------------------
    Jim Garrett
    Sr. Assoc. Dir. of Biostatistics
    Novartis



  • 13.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 17:03

    RE: Frankly, it seems that if prediction is the goal, that fundamental paradigm is not supported by statistical theory or by anecdotal experience.  Maybe it's time to start teaching something else!

    RESP: It is time to upgrade the teaching of statistics to would-be applied statisticians.  A typical two-year M.S. in statistics provides about a year's worth of applied training.  The B.S. in statistics is much more useful. 

    Theory: There is statistical theory supporting all four statistics modeling paradigms: prediction, coeff estimation, grouping, and ranking ... all for situations involving uncertainty.  We lapse into talking about that part of theory for coefficient estimation as the main show.    

    Anecdotal Experience: Respectfully, we have been working on these problems for decades and we have the (largely unpublished) learnings to show for it, which a few of us call Best Statistical Practice.  There are so many people working on these problems and for so long that new ideas are uncommon. 

    RE: I was also recently reading a paper by Leo Breiman, "Statistical Modeling:  The Two Cultures" (Statistical Science, 2001), and the discussion which follows, which largely recapitulates this entire discussion.

    RESP: This is a confused paper illustrating how many academic statisticians have completely lost sight of applied statistics.  Leo's, heavily insulated, 'field trip' taught him nothing about what applied statisticians have been doing in the field since forever.  In sharp contrast, that paper's comment from a single applied statistician is well informed.  Academic statisticians are for more removed from what is going on in the field than they realize, see Amstat News. 

    ------------------------------
    Randy Bartlett



  • 14.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 19:00

    I think Randy hit on something important.

    Right now I am arguing with program heads/advisors in two different MS programs (at 2 different universities) I belong to about what IS important and what the programs require and offer. 

    The MA Applied Stats program I am part of recently changed their degree from 6 of 10 courses being "Math Proof" based, 2 Stat theory classes and  3 applied stats classes, one of which is an elective => 2 stat theory/proof classes, 2 math classes (1 proof, one applied or proof), 6 applied stat classes. Since the department also offers an MA in Mathematics, which is struggling to keep students, they won't let stats students take courses in Database systems, Business Intelligence, Data Mining/Science, etc. All classes must be from the math department. I had an argument with the advisor over whether or not Modern/Abstract Algebra 1 and 2 were "good alternatives" for the math electives.

    I told the advisor, "I'm taking Database courses from the business school or I'll walk!"  He was totally against it. He didn't think they were "worthwhile" courses for an applied statistician. He feels statisticians need to know basic R and SAS, which is covered in the classes. None of the stats professors in the program teach data manipulation as part of the courses, no one from the program knows anything about SQL, etc. They also feel "bayesian" is a fad, forecasting and time series are not important, Data mining is a computer science class, not a stats class and a Survey Stats class won't garner enough students. So, they don't offer it. (Hello Clueless!) 

    The MS Business Analytics program I belong to offers 3 "stats" classes, Intro to business stats, applied forecasting and data mining for business. In this program, everyone picks a 4 course specialization. I choose Data Mgt. The MIS department has Data Mgt 1 and 2, Business Intelligence, Programming 1 and 2, all on the books. However, they don't offer the programming courses, nor the Data Mgt 2 course because students won't sign up for them. They also require an Intro to Info Systems, which is a course that lets us know computers are good and useful. It's also a total waste of time and elective! 

    The advisor for this program is under the assumption that "Big Data" analysis can be done with Excel and Minitab. (They can handle up to 100,000 rows!) R. SAS. Hadoop, Map Reduce, etc, are NOT NEEDED! He got this information from a group of industry folks. These higher ups in industry claim, "We hire people based upon there ability to learn. We'll teach them what they need to know." Meanwhile these same companies require candidates "with significant experience" in SAS, R, C++, SQL, Hadoop, etc. I want to take these courses from the engineering department. However, the advisor claims I can't use them as electives because the engineering department is not accredited by the same group that accredits the business school. So, I can't take Big Data Analysis, Pattern Recognition and Neural Networks, Data Mining 1 and 2, etc. When I brought up the fact that a lot of other programs cross listed these types of classes at other universities or sometimes even required non-business school classes for their business degrees, the advisor needed some time to come up with his next excuse. (Again, CLUELESS!!!) 

    Needless to say, I am tired of the sillinesses and stupidities of these programs. Interestingly enough, both of these universities are going to offer new bachelor's degrees in Data Science. Each degree requires 4-5 applied stats classes, 4-5 "Data Science" classes,  4-5 "programming" classes and some areas of specialization. One of these departments is doing it all on there own, over the objection of the math and stats department. The other program worked with the math department and is coordinating course offerings. Anybody want to guess the probability I drop my master's programs and go for a second bachelor's degree?

    Of all the programs I have seen, these Data Science programs make the most sense. You cover the important topics in each discipline. You are not an expert in any one field. But, you can see the arguments each field makes and can judge the validity of them. You also have an understanding of an area outside of math and comp sci to apply your methods too. So, you have an understanding of where your clients are coming from, what they might want from you and you can suggest to them better methods for modeling. Sounds like a big win to me;-)   (And like someone actually get it.)

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)



  • 15.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 20:31

    "The advisor for this program is under the assumption that "Big Data" analysis can be done with Excel and Minitab. (They can handle up to 100,000 rows!) R. SAS. Hadoop, Map Reduce, etc, are NOT NEEDED! He got this information from a group of industry folks. These higher ups in industry claim, "We hire people based upon there ability to learn. We'll teach them what they need to know." Meanwhile these same companies require candidates "with significant experience" in SAS, R, C++, SQL, Hadoop, etc."

    He's hearing what he wants to hear.  Yes, indeed, companies do hire based on your ability to learn.  But how would your resume indicate you have the ability to learn? Because you've already learned relevant stuff.  If you've shown you're a wiz in R, it's probably a pretty good bet that you can become a wiz in the IBM toolset or the SAS toolset. (but, if everything else was equal, I'd still hire a guy with immediately useful skills. Who wouldn't?)

    But if all you can say is "I know Excel and Minitab and I'm willing to learn" -- how would I, as an employer, know you are a good risk? Frankly, you'd come across as naive. And you'd probably not even pass the HR resume screen to get an interview.

    Maybe an online course at statistics.com in SQL or predictive analytics would push your resume past the HR screen.

    Good luck! 

    ------------------------------
    Michael Kruger
    M W Kruger Consulting



  • 16.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 22:38

    I live in the Metro Detroit Area. We have several places that "hire" statisticians. The only place that routinely does a direct hire of a statistician is The University of Michigan. All the auto companies, auto suppliers, etc, hire indirectly first. They use contractors first. Then hire them directly later.  

    The issues I am facing are, "What does it take to be a statistician?"  and "What counts as experience?".

    Suppose you had an opening to work with scientists and engineers on their research projects, (Statistical Consulting) such as CSCAR does at the University of Michigan. You come across the resume of someone that spent 12 years working as a chemist, where they designed and analyzed experiments for their research labs (academic and industrial) and friends. The have taken 16 stats classes and another half dozen operations research courses at the graduate level. They hold 2 master's degrees and worked directly as a "statistician". Would you at least contact that person and discuss things via email? Would you interview them? Suppose further that you and several of your coworkers talked to this person before and you know they will bring a new perspective to you group. And, during your talks with them, they were able to solve all the problems you had with the projects you were working on and had worked on in the past. How do you handle their resume, if it comes across your desk? Right, you avoid this person at all costs. Why? If you are U of M, you feel the years of experience working as a chemist/statistician in a lab is "real" work as a statistician. Plus, the candidate didn't publish any of their work as an industrial chemist. So, the candidate only has "2 years" experience. If you are Henry Ford Health System, the candidate doesn't have a "stats" degree. Therefore, they are ineligible. As for the contracting companies, most of them are less useful than used toilet paper. (Yes, I am referring to myself.) 

    I applied to a PhD program in Industrial Engineering at Wayne State University and sent some e-mails to the head of the Research Development and Analysis (statistical consulting) group at WSU. I volunteered to work there. Since I'm not in the psychology department, and specializing in research methods, I am not qualified for their positions. 

    Right now, I am working on a data set, via a capstone course, that has no meaningful results from the data I can easily use. We want to find drivers for bad survey results. The company already has a satisfactory score of 94%. They need above 85%. All we are doing is modeling noise. 

    We took the time to ask what the company wanted. They wanted an explanatory model with good predictive powers. They think it would be "value added" to be able to take customer data from incident tickets and predict the response of the customers to that data. If  I use an explanatory model, every factors I put in the model is "significant" because I have 100,000+ tuples of data. CART, Random Forests, Boosting and Bagging are not helping. Neural Networks are no help too. (Asking the company to be "horrible for a fiscal quarter" is not an option. Though they did get a good laugh out of that;-)   

    Through these, and my other experiences in data analysis, I doubt I'll be using traditional methods for much other than the analysis of designed experiments. My stats profs might give me dirty looks. But, I'd rather get the right answer to the right problem than an answer that "looks good on paper". 

    I have no more methods to use on the data, unless I start building a corpus for text mining. Then, I'll need to build that corpus from french, german, spanish and english written responses. (All of which I have done without the use of Google or Bing.) The text responses are so minimal in number and characters, that I need more data than I have, for the corpus.

    So now I'm looking at how to explain why my prediction model failed... along with everything else. 

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)



  • 17.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 21:57
    A simple way to assess the complexity / parsimony of a  model  is to ask how many free parameters are being estimated. Introducing powers of a variable, and products of variables means more coefficients to estimate, hence less parsimony. David Greenberg, Sociology Dept. New York U

    On Tue, Mar 22, 2016 at 2:09 PM, Laura Kapitula via American Statistical Association



    ------Original Message------

    Very interesting discussion.  I was pondering lately is a model with interactions and higher level terms really less parsimonious than a model that is linear?  This is assuming enough data to estimate the functional form well.  It still uses the same number of variables, whether linear or a polynomial you should be wary of extrapolation and estimation on the boundaries.  I realize over fitting can be dangerous with too many terms and you need to be cautious but assuming enough data is present is a polynomial more "complex" than a line or is it just mathematically more terms.  Obviously if a line is the most correct, use the line, but if there is curvature and you model it to me that is not more complex, just a better model.  To me adding variables is a definite increase in complexity because we would need to know more information for prediction.

    Do you think the complexity is determined by the number of variables or the the number of terms from a philosophical sense?  I understand there is more parameters to estimate with the more complex models but in the presence of enough data to estimate them is that relevant?

    Thanks.

    Laura

    ------------------------------
    Laura Kapitula
    Assistant Professor
    Dept. of Statistics
    Grand Valley State University
    ------------------------------


  • 18.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 23:11

    David,

    I understand that traditional argument, that  number of parameters determines the complexity.  I am just contemplating the logic of that.  I understand the estimation piece but if you assume that you have enough data that estimating those parameters is not a problem and can be done reasonably well with smallish standard errors,  Then from my point of view the complexity is not really greater for a model with polynomial terms and interactions because if your goal is prediction you still need to know the same number of variables.  So if you use a spline or polynomial or some other non-linear form I am thinking it is not really less parsimonious from the prediction point of view because for prediction what would matter most is the number of variables needed to predict, not the number of parameters.

    Laura

    ------------------------------
    Laura Kapitula
    Assistant Professor
    Grand Valley State University



  • 19.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 13:59

    I'm not an expert on super learning, but my understanding was that super learning is just a super ensemble since you combine predictions from very different types of "learners" (e.g., random forests, logistic regression).  I don't think that the fitting of the models focuses on different "regions" or "aspects" of the data, but rather the super learner works because one candidate learner may work better in region while another candidate learner works better in another region.  It is true that the way that the averaging of the models is done is data adaptive, but this aspect is subject to cross-validation, which protects from overfitting.  I am very interested to hear others views here because I am far from confident about super learning.

    ------------------------------
    Robert Podolsky
    Associate Professor
    Wayne State University



  • 20.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 17:36

    I think we best start with clear, succinct definitions of "prediction" models and "explanatory" ("interpretable") models.  The wording in your posting seems, at least to me, to use the terms to describe models that differ only in their complexity and not in their genesis.  A good place to start would be:

    o  "Use and Abuse of Regression" / Technometrics v8 n4 625-629 / George E. P. Box

    o  "The Use and Misuse of Multiple Regression" / Industrial Quality Control Oct1966 184-189 / Gerald Hahn and S. S. Shapiro

    o  "To Explain or to Predict?" / Statistical Science 2010 v25 n3 289-310 / Galit Shmueli

    ------------------------------
    Wayne Fischer
    Statistician
    University of Texas Medical Branch



  • 21.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 19:37

    This is a great topic for community discussion. The topic is a bigger issue than simply the choice of statistical procedure or method.

    For example, the analysis aims might influence what covariates you consider in the model if variables that improve prediction may be known to not be part of some underlying causal mechanism (attempting to identify and interpret such a mechanism could be a separate analysis goal from numeric prediction).  Further, even regression methods, when the goal is numeric prediction, may not be entirely interpretable.  For example, one might use penalized regression (e.g. LASSO) for variable selection and improved predictions.  However, we know that the resulting point estimates for covariates are biased and confidence intervals and p-values are not straightforward - thus, even regression methods may not be great for interpretation.

    So - how do I decide when a predictive model is the best choice and how do I mitigate risk that a client will change their mind?  I offer examples and 'potential' interpretations during the discussion of the project hypotheses/aims.  It's important to communicate that building an analysis (including specific modeling methods) for a specific goal may yield a result that is not well-suited for other goals.  Basically - if we knew the 'true model' including all important covariates and their functional relationship to the outcome, it would be well suited to any goal.  But since we never know the 'true model', we make decisions with corresponding consequences (trade-offs).

    ------------------------------
    Phillip Schulte
    Mayo Clinic



  • 22.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 22:20

    RE: '... often at the expense of interpretability of the model.'

    RESP: Among many of our clients, there is this loose idea that we want models that are both interpretable and predictive at the same time.  We seldom want this.  In academic statistics, they usually want to estimate coefficients so that they can learn about causality.  In applied statistics, if we need to predict, then we build a model for that.  If we need to interpret, then we build a coef model.  If we need both, then we build two models. 

    I have written a number of related blogs on Statistical Denial:

    https://datafloq.com/read/Statistical-Significance-Does-Work-Big-Data/1385

    ALL of the predictive models involve uncertainty, they are statistical models even though many people using them are clueless about the underlying statistical assumptions. 

    https://datafloq.com/read/prediction-is-is-not-part-of-statistics/1383

    ------------------------------
    Randy Bartlett



  • 23.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-21-2016 23:26

    What an interesting and informative discussion!  

    I am trying to understand Randy's comment on the distinction between a model being interpretable versus a model being useful for predictive purposes and how these two objectives may warrant two different models for the same data set. 

    I can see how it is possible to have an interpretable model with poor predictive performance if the residual variation is too large.  

    I can also see how it is possible to have a model that is not easily interpretable but has good predictive performance. 

    But I can't really see why it is not desirable to have a model which is both interpretable (since it uncovers and quantifies important relationships/associations between variables of interest) and has good predictive performance.

    Model interpretability considerations require us to include variables in our model which have a statistically insignificant effect on the outcome variable but are scientifically relevant. One could argue that such variables can diminish the predictive performance of the model (though hopefully not by much), but one might be hard pressed to ignore such variables even if prediction is the final goal. 

    The way I see it, model interpretability is first driven by the science behind the data generating mechanism and then by statistical considerations. Once we have a sensible model for the data, we can use that same model for predictive purposes provided it has a good predictive performance. Whether or not it has good predictive performance is ultimately driven by the ratio of signal to noise. If the noise overwhelms the signal, we can throw increasingly sophisticated models at the data (which are less and less interpretable) without being able to overcome the fact that there is simply too much noise to contend with. 

    Thanks, 

    Isabella

    ------------------------------
    Isabella Ghement
    Ghement Statistical Consulting Company Ltd.



  • 24.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 07:32

    This could be a bit of a sidebar, but -

    The difference in the "explanatory" vs. "predictive" modeling is in the objective of the modeling, not the methodologies employed. Specifically, the objective of the explanatory model is to explain what is important, and the objective of the predictive model is to predict with accuracy. The methodological impact of this distinction is that the former "optimizes" on the noise at the independent variable level, whereas the latter "optimizes" on the noise at the dependent variable level (note that I use "optimize" in a very loose way here). The results are often stunningly different. And it's the objective of the modeling that really determines that which methodologies are possible and which are not (or at least not appropriate). FWIW, in my biased set of experiences those that are more research-focused tend to hold philosophies closer to those of explanatory modeling, whereas those that are more business-focused tend to hold philosophies closer to those of predictive modeling.

    What ends up happening in predictive modeling is that some of the more traditional best practices in it includes interpretability of the models, in that it has to make business sense. This usually ends up in a sort of a compromise if you're coming from the purist predictive modeling extreme. Credit scoring is a perfect example, where the interpretability of a predictive model is essentially legally mandated.

    Because of the objective, the explanatory modeling is generally associated with policy/strategy making, whereas predictive modeling is often associated with operational/tactical decision making. This is not to say that the capability of predictive modeling is not strategic, but rather that the typical use case of predictive models are very operational and tactical in nature (making lots of individual decisions repeatedly).

    I do see all the time that the differences between the two are not very well understood, especially because of the current attention on the predictive analysis in the general world out there, and the word "predictive analysis" is often misused and thrown out there, and even used to mean "modeling" in general by non-statisticians (which makes me cringe).

    P.S. You all just hit on my hot-button topic :-)

    ------------------------------
    Michiko Wolcott
    Principal Consultant
    Msight Analytics



  • 25.  RE: Methods for Explanation and Prediction - CNSL Monthly Official Discussion Topic for March/April, 2016

    Posted 03-22-2016 11:39

    Michiko,

    Your comment is the main bar and not a sidebar.  You write like an applied statistician. 

    RE: I do see all the time that the differences between the two are not very well understood, especially because of the current attention on the predictive analysis in the general world out there, and the word "predictive analysis" is often misused and thrown out there, and even used to mean "modeling" in general by non-statisticians (which makes me cringe).

    P.S. You all just hit on my hot-button topic :-)

    RESP: We see this confusion ALL the time too, especially outside applied statistics.  Using two models is proven BSP (Best Statistical Practice) among applied statisticians.  Convincing clients, who are mixing the two, requires a relationship.  The only ref I can give for this and other BSP is my own book (fortunately): http://amzn.to/YGhXzv.  

    ------------------------------
    Randy Bartlett