ASA Connect

 View Only
  • 1.  (Pithy) Beginner's Tips for Regression

    Posted 06-23-2020 20:28
    Hi folks,

    Many of us remember back to our "introduction to regression" classes, in which the instructor had many rules/reminders/tips of how to perform sound regression.

    These rules/reminders/tips were typically things to do with checking model assumptions or practical rules of thumb.

    Examples include...
    - Check if your errors are normally distributed.
    - Check for homoscedasticity
    - Are your coefficients practically significant? Or just statistically significant?
    - Have at least X events for each predictive variable. (https://en.wikipedia.org/wiki/One_in_ten_rule)
    - Look for outliers in the error terms.
    - If you have interaction terms, include the base terms as well. (Unless you have a good reason not too.)
    - Compare your chosen model to the "kitchen sink" model
    - Stop adding variables when ________.
    - Remove some variables when ________.

    No doubt that I'm missing quite a few!

    While sticking to the most basic regression methods (ex. linear, logistic, survival), would people mind listing their favorite regression rules-of-thumb?

    No rule is too trivial or too pithy! (I'm interested in the breadth of advice given to beginners to better understand the priorities teachers have for beginning students.)

    Thank you!

    ------------------------------
    Glen Wright Colopy
    DPhil Oxon
    Data Scientist at Cenduit LLC, Durham, NC
    ------------------------------


  • 2.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-24-2020 11:31

    There are always Andrew Gelman's six tips.  See also his Forking paths vs. six quick regression tips, too.

    Given  your examples, you might find  his It's not about normality, it's all about reality  of interest, too.

    Bill



    ------------------------------
    Bill Harris
    Data & Analytics Consultant
    Snohomish County PUD
    ------------------------------



  • 3.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-24-2020 11:35
    Plot the Data!!! :-)
    (Before you even start thinking about regression)

    ------------------------------
    Fred Girshick
    ------------------------------



  • 4.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-25-2020 17:56
    Further practical (not statistical) tips. These should get you on the way to building good, predicable, stable models:

    - If you are not a Subject Matter Expert (SME) get one.
    - Look at bi-variate relationships to find variables that have significant relationships in the "correct direction" (as per SME)
    - Eliminate or truncate outliers from bi-variate analysis.
    - Use bi-variate analysis to "fix" non linear relationships.Two ways to fix, non linear transforms or binning (my favorite)
    - at each step in regression look at direction of new variable to make sure it is same as bivariate and still "correct" direction (as per SME).
    - at each step in regression look at how much new variable adds to predictive value and if it changes coefficient directions of other variables.
    - Check final model for Variance Inflation Factor or similar measures o inter correlation issues.
    - check your model against a hold out sample from original data to protect against over fit.
    - check you final model against an independent (out of time and/or space) sample to check for stability and out of sample effectiveness.





    ------------------------------
    Michael Mout
    MIKS
    ------------------------------



  • 5.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-26-2020 17:47
    Fred - 

    I agree.  Plot the data.  And for regression, a graphical residual analysis to examine fit, and perhaps cross-validation to study whether you have overfit to your particular sample.  

    Cheers.

    ------------------------------
    James Knaub (Jim)
    Retired Lead Mathematical Statistician
    ------------------------------



  • 6.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-25-2020 08:03
    Check for multicollinearity.

    ------------------------------
    Michael Sullivan
    Joliet Junior College
    ------------------------------



  • 7.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-26-2020 07:45
    Here are a few aimed more at students than statisticians:

    1. Do not put nominally coded variables with 3+ levels  (like race) into a model as if it was numeric (yes, I've seen this done at least twice). Students also do it for Pearson correlations.

    2. Double check what your statistical program thinks is the direction of the coding for both dependent variable (logistic) and independent variables (especially logistic but could be others). Amazing -- smokers have less lung cancer.!... Maybe not.

    3. If you let the program create dummy codes, be sure you know what it is doing (there are several ways to dummy code) and be sure the reference group has a decent sample size. SAS does "effects coding" by default, which, IMHO, creates a big mess.

    4. Always check the results for anomalies! No, it is NOT normal to have a non-significant odds ratio of 53,000,000 from logistic. And are you sure that standardized regression weight of 12 isn't a glitch?

    Ed

    ------------------------------
    Edward Gracely
    Drexel University
    ------------------------------



  • 8.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-26-2020 17:41
    Glen - 

    I'm not sure that pithy is always very helpful, though when appropriate, I'm sure it can be.  

    Well, I'm not a teacher, but I have some problems with both of the first two "rules":

    - Check if your errors are normally distributed.
    - Check for homoscedasticity

    The first is not a very strict rule, and the second should not happen:

    https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity

    https://www.researchgate.net/publication/333642828_Estimating_the_Coefficient_of_Heteroscedasticity

    The following is on the fundamental nature and magnitude of heteroscedasticity, for regressions of form y = y* + e, most useful for predictions for finite populations:  
    https://www.researchgate.net/project/OLS-Regression-Should-Not-Be-a-Default-for-WLS-Regression

    But you need homoscedasticity for hypothesis tests?  Well, hypothesis tests are overly relied upon, and often misused, often only looking at p-values which are virtually useless as stand-alone numbers. So one questionable practice (assuming or forcing homoscedasticity - perhaps with a transformation which may obscure your picture) leads to another (overreliance on often misinterpreted hypothesis tests).

    Note that my first link above discusses an idea from Ken Brewer's book, Combining Survey Sampling Inference: Weighing Basu's Elephants, Arnold/Oxford, 2002.

    Thanks.


    ------------------------------
    James Knaub (Jim)
    Retired Lead Mathematical Statistician
    ------------------------------



  • 9.  RE: (Pithy) Beginner's Tips for Regression

    Posted 06-29-2020 07:42

    When teaching Regression, I always include my Rule of Outliers:

    • One outlier is a data error
    • Two outliers are outliers
    • Three outliers make a cluster

    While of course the numbers aren't intended to be exact, the principle is instructive: rare and radically different data points are often underlying errors in capturing the data. These could be dismissed but it would be better to try to resolve the underlying error in data capture if possible. Real outliers can be dealt with as outliers - that is, included in the analysis, not dismissed. Various methods for robust / weighted regression can be applied. A lot of outliers with similar characteristics aren't outliers at all: comprise a separate group that needs to be modelled separately. 

    The real idea of my Rule of Outliers is to encourage students to look at the data carefully before attempting regression methods, and then apply modeling methods best suited to the particular data at hand.  



    ------------------------------
    David J Corliss, PhD
    Director, Peace-Work www.peace-work.org
    davidjcorliss@peace-work.org
    ------------------------------