ASA Connect

 View Only
Expand all | Collapse all

History of Multiple Regression

  • 1.  History of Multiple Regression

    Posted 01-26-2021 18:45
    I'm looking for resources (articles, papers, books) that discuss the history of multiple regression analysis.  Specifically we're looking for timelines, major contributors, achievements, progress and pitfalls in the development of these techniques, concepts, and applications.  I appreciate any help you can provide.

    Thanks.

    Dr. Arunas Dagys
    Saint Xavier University
    Chicago Illinois

    ------------------------------
    Arunas Dagys
    St. Xavier University
    ------------------------------


  • 2.  RE: History of Multiple Regression

    Posted 01-27-2021 08:14
    Edited by Glen Colopy 01-27-2021 08:28
    Hi Arunas,
    Thanks for starting this interesting thread!

    Consider these two articles...
    [1] A tutorial history of least squares with applications to astronomy and geodesy
    https://www.sciencedirect.com/science/article/pii/S0377042700003435

    [2] Gauss and the Invention of Least Squares
    https://projecteuclid.org/euclid.aos/1176345451

    Both are freely available.

    I know of a third article...I think it was in popular Science back in the 60's and describes the thoughts behind Kalman filters and least-squares as part of a tutorial. (Please put "?"'s next to each of the specifics in that last sentence.) It's really good but I can't find it. 

    Looking forward to seeing other people's suggestions as well!

    ------------------------------
    Glen Wright Colopy
    DPhil Oxon
    Data Scientist at Cenduit LLC, Durham, NC
    ------------------------------



  • 3.  RE: History of Multiple Regression

    Posted 01-28-2021 06:48
    What type of history are you trying to get to? 

    Penrose wrote about the pseudo-inverse (from what I learned in Linear Algebra) as a way to get coefficients for over determined systems.... Like those we encounter every day in stats. 

    My Management Science students do a couple problems based upon "non-linear" optimization which is essentially a "sum of squares" method that is how many years old? If you run the algorithm on data from an MLR, you can get the same coefficients as MLR. Nice thing about the problems they get is, it doesn't care about "missing data". No matter how much missing data you have, you still get nearly the same coefficients as you did before. No error messages. No removal of rows of data. No imputing the missing data. It just does what it's told to do.

    In both of these cases, the ideas are central to the analysis of MLR. But, neither is usually covered in a typical stats textbook.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 4.  RE: History of Multiple Regression

    Posted 01-28-2021 07:10
    Hey Andrew,
    Good points!
    Not to hijack this thread, but I like the optimization exercise you described for your students.
    I've thought for a long time now that students are being taught incorrectly by just handing them the analytical solutions to MLE right from the start. It makes them think that the inference is synonymous with the model itself...when in reality, those are separate issues.  

    They shouldn't be handed analytical results, but instead build their intuition about parameter inference by optimizing (computationally) a set of objective functions. ...SSE and the likelihood function would just happen to be two of those objective functions.

    (I've tried thinking of some equivalent computational exercises for integrating over the parameter space, but they tend to revolve around optimizing out-of-sample performance....which I'm worried would distract early learners from the important learning takeaways.)

    ------------------------------
    Glen Wright Colopy
    DPhil Oxon
    Data Scientist at Cenduit LLC, Durham, NC
    ------------------------------



  • 5.  RE: History of Multiple Regression

    Posted 01-30-2021 21:35
    Andrew,
    My goal here is to advise an undergrad student writing a paper on Multiple Regression who is looking for some historical background to introduce the subject, but to also identify contributors, timelines and milestones in the development.  Just for context, the student is a senior mathematics major at SXU which is small school with a relatively small department of mathematics offering majors in Math, Math/Sec. Ed.  and Actuarial Science.  Our majors do a senior seminar, which is where this question arose.

    Thanks for your help.

    Arunas

    ------------------------------
    Arunas Dagys
    St. Xavier University
    ------------------------------



  • 6.  RE: History of Multiple Regression

    Posted 01-30-2021 21:20
    Glen,
    Thanks for your help!

    Arunas

    ------------------------------
    Arunas Dagys
    St. Xavier University
    ------------------------------



  • 7.  RE: History of Multiple Regression

    Posted 01-30-2021 21:22
    Glen,
    Thanks for your help!

    Arunas

    ------------------------------
    Arunas Dagys
    St. Xavier University
    ------------------------------



  • 8.  RE: History of Multiple Regression

    Posted 01-28-2021 11:45
    Edited by Hans Kiesl 01-28-2021 11:45
    Arunas,

    why not write a book about it? I'd be happy to read it :-)

    Some chapters I would like to see:

    1. Least squares (Gauss and Legendre)
    2. Origin of the term „regression" (Galton)
    3. Moving regression away from correlation (Yule, Fisher)
    4. MLR in matrix notation (Aitken)
    5. Evolution of computer programs for MLR (I have no idea about that)
    6. Generalized linear models (Nelder/Wedderburn 1972)
    7. Regression with censored data (Cox 1972)
    8. More econometrics stuff (IV regression, time series regression)
    9. LASSO (Tibshirani 1996) and other regularization technique.

    There is a lot of literature about (1), e.g. in Stigler's book "History of statistics", or the papers cited by Glen.


    About (2), you might look at

    Prakash Gorroochurn (2016), On Galton's Change From "Reversion" to "Regression", The American Statistician, 70:3, 227-231

    Concerning (3),

    John Aldrich (2005), Fisher and Regression, Statistical Science Vol. 20, No. 4, 401–417

    Hilary Seal (1967), The historical development of the Gauss linear model, Biometrika Vol. 54, No.1-2, 1-24.


    I'm not aware of a book about the historical development of the more recent subjects. Maybe in the econometrics literature? I would consider the papers by Nelder/Wedderburn, Cox, Tibshirani as major breakthroughs in the field of (multiple) regression. I'm sure others will have lots of additions to this list.

    -Hans-



    ------------------------------
    Hans Kiesl
    Regensburg University of Applied Sciences
    Germany
    ------------------------------



  • 9.  RE: History of Multiple Regression

    Posted 01-30-2021 21:28
    Hans,
    Thanks for your help, as well as giving me an outline for what might be an interesting albeit lengthy book.  My goal here is quite a bit smaller, namely to advise an undergrad student writing a paper on Multiple Regression who is looking for some historical background to introduce the subject, but to also identify contributors, timelines and milestones in the development.

    Arunas

    ------------------------------
    Arunas Dagys
    St. Xavier University
    ------------------------------



  • 10.  RE: History of Multiple Regression

    Posted 02-02-2021 17:01
    Dear Arunas,
    As others have mentioned, I think your proposal to do a history of Multiple Linear Regression would prove to be of interest to many readers.
    I believe I can provide for you an early application (in physiology) of multiple linear regression, which appears in a manuscript written over 100 years ago!  Here is a background story:  A few years ago, a nutritionist told me of a formula which predicts the human metabolic rate as a function of age, height, and weight.  The nutritionist told me that  as we get older, our metabolic rate (as predicted by the formula), slows up to 2% per decade. In other words, a 70 year old's  metabolic rate might decrease as much as 10% compared to a 20 year old.  Put another way, an older person could gain more weight as he or she gets older, unless they exercise a little more and/or eat more healthy food!  
    The nutritionist told me the formula is called the "Harris-Benedict Equation".  As mentioned above, it is over 100 years old, but the formula is evidently still used in research and clinical work.  I was curious what this formula looked like, and googled it.  The first link below refers to one of the original papers by Harris and Benedict; the formula appears on the last page. Interestingly enough, the formula  was indeed a multiple linear regression from 100 years ago! Separate formulas appear for males and females. Before computers and statistical packages, doing a multiple regression was a fairly intensive calculation. If one looks at old statistics textbooks going back to, say, the 1950's or earlier, one would find a method by Doolittle; this was a computing plan to help facilitate the many calculations. (For example, see Walker and Lev's textbook, Statistical Inference, 1953)  
    From a wikipedia page (see second link below) for the author Harris,  it indicates that he was a botanist and a biometrician; in fact he was elected as a Fellow of the ASA back in 1922!   Benedict was apparently a nutrition scientist.  A paper which provides a history of the work of Harris and Benedict (third link below) indicates that the equation is one of the first applications of multiple linear regression to human physiology.  I suspect it might have been, more broadly, one of the first in the biological and medical sciences.
    Hope you find this information helpful.
    Sincerely,
    Martin Feuerman
    Retired Biostatistician


    ------------------------------
    Martin Feuerman
    ------------------------------



  • 11.  RE: History of Multiple Regression

    Posted 02-02-2021 17:14
    It a good thing Benedict & Harris didn't get the typical journal reviewer today...

    *Reads their submission*
    *Leafs through Gauss-Legendre debate*
    REJECT: No novel statistical methodology.

    ------------------------------
    Glen Wright Colopy
    DPhil Oxon
    Data Scientist at Cenduit LLC, Durham, NC
    ------------------------------



  • 12.  RE: History of Multiple Regression

    Posted 02-02-2021 17:53
    Could be worse. "You can't change more than one thing at a time during an experiment. Statistics DOESN'T ALLOW IT!!!!" 

    You have to watch it with some of those older formulas in science. They do weird things like, "To approximate the weight of a person, assume they are a box... Their density is roughly the same as water." 

    They also use lots of simplifying approximations too. You might have a quadratic response. But, they will use an exponential model because everything in science is either linear or exponential.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 13.  RE: History of Multiple Regression

    Posted 02-04-2021 11:09
    pp 11-61, Stephen Stigler , The History of Statistics: The Measurement of Uncertainty before 1900

    ------------------------------
    Michael Sack Elmaleh
    Principal
    Michael Sack Elmaleh CPA, CVA
    ------------------------------



  • 14.  RE: History of Multiple Regression

    Posted 02-04-2021 14:46
    Jon,
    See forward.  I'm still getting some info from the post.

    Dr. Dagys