ASA Connect

 View Only
Expand all | Collapse all

early examples of spurious correlation

  • 1.  early examples of spurious correlation

    Posted 05-16-2017 22:19
    Somewhere in my long term memory is a dim recollection of what I thought
    of as one of the oldest if not
    the oldest examples of a spurious correlation.

    What I recall may have come from Kendall & Stewart: it concerns
    the relationship between the number of radios licensed in the UK (x) and
    the number of people with mental
    illness (y), with a strong positive correlation. The punch line, of
    course, was that over time, both x and y were
    increasing, but for different reasons.

    Can anyone supply a reference for this, or an alternative early
    reference to an example of spurious correlation?

    --
    Michael Friendly Email: friendly AT yorku DOT ca
    Professor, Psychology Dept. & Chair, Quantitative Methods
    York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
    4700 Keele Street Web: http://www.datavis.ca
    Toronto, ONT M3J 1P3 CANADA


  • 2.  RE: early examples of spurious correlation

    Posted 05-17-2017 09:08
      |   view attached
    In my teaching regression, I used to quote Mark Twain's Life on the Mississippi (Originally published in 1883, Chapter 17).  So far as I know, this quotation (Attached) is the earliest example of nonsensical correlation.  He wrote the book when he was still Samuel Clemens and taught high school mathematics.

    Ajit K. Thakur, Ph.D.
    Retired

    ------------------------------
    Ajit Thakur
    Associate Director
    ------------------------------

    Attachment(s)

    docx
    Mark Twain Quote.docx   10 KB 1 version


  • 3.  RE: early examples of spurious correlation

    Posted 05-17-2017 09:32
    I cannot give you a reference (I think it was given in an old Penguin book on statistics by Moroney), but an early study showed that in some towns in Germany there was a high correlation between the number of storks and the number of babies born.

    ------------------------------
    Robert Elston
    [Emeritus Professor,
    Case Western Reserve University]
    ------------------------------



  • 4.  RE: early examples of spurious correlation

    Posted 05-17-2017 11:30
    Oldenburg, Germany.  See page 8 (Figure 1.4) and its reference in the first (1978) edition of Box, Hunter, Hunter.

    ------------------------------
    Emil M Friedman, PhD
    emilfriedman@gmail.com
    http://www.statisticalconsulting.org
    ------------------------------



  • 5.  RE: early examples of spurious correlation

    Posted 05-17-2017 09:43
      |   view attached
    The earliest example I am aware of spurious Pearson's product-moment correlation, is a paper by Pearson himself in 1897, "a Form of Spurious Correlation which may arise when Indices are used in the Measurement of Organs." (paper attached).

    In general, it is measuring the magnitude of spurious correlation when ratios of independent measurements are considered. The specific case here are organ/body length ratios of shrimp, reported in an earlier paper by W.F.R Weldon (who comments on the Pearson paper in the same issue).

    This is a fascinating early consideration of compositional data analysis, and cited often in the CoDA literature (i.e. by the John Aitchison "school").

    ------------------------------
    Zachary Kurtz
    ------------------------------



  • 6.  RE: early examples of spurious correlation

    Posted 05-17-2017 09:44
    Plot the populations of polar bears vs penguins. 

    You can also discuss why storks bring babies. (Think, 9 months before the storks arrive in Germany is???)

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 7.  RE: early examples of spurious correlation

    Posted 05-17-2017 11:42
    There's an entire book devoted to that:  Tyler Vigen (2015), "Spurious Correlations."  It's wonderful.  Examples from the book:
    Ozone near ground level is 88.7% correlated with functionally obsolete bridges, by year;
    Number of Atlantic hurricanes is 79.2% correlated with draft picks by the Boston Celtics;
    "Two and a Half Men" ranking against other CBS shows is 81.4% correlated with the NY stock exchange (highest composite price);
    Harley-Davidson's motorcycle revenue is 82.1% correlated with the U. S. fertility rate.  Hmm... maybe they have something there...
    Gross revenue from U.S. symphony orchestras is 89.2% correlated with juvenile arrests for pot possession.

    ------------------------------
    George Yost
    ------------------------------



  • 8.  RE: early examples of spurious correlation

    Posted 05-17-2017 11:34
    Hello Everyone!

    This is not an answer to the original question, but an exemple of a very nice spurious correlation, 'Ice cream and IQ' that appeared in «The Economist». Not only is it an example of a spurious correlation, but it is not even an example of a well-defined correlation since the 2 variables are not (apparently) linearly related. Thus a double example... interesting for teaching purposes!

    http://wikistat.mgi.polymtl.ca/tiki-download_file.php?fileId=358
    Polymtl remove preview
    View this on Polymtl >


    ------------------------------
    Marc Bourdeau
    Ecole Polytechnique
    ------------------------------



  • 9.  RE: early examples of spurious correlation

    Posted 05-17-2017 13:41
    I remember my favorite, an early example I read in some stat book--a correlation of .93 between the annual number of plumbers in the plumbers union in the U.S., and the annual number of inches of rain in the Indian province of Hyderbad (sp) from 1913-1924.

    ------------------------------
    Milton Goldsamt
    Consulting Research Statistician
    ------------------------------



  • 10.  RE: early examples of spurious correlation

    Posted 05-17-2017 13:59
    Michael,
    You might want to distinguish time-based spurious correlations (the topic of Vigen's book) from "cross-sectional" spurious correlations. A Pearson correlation on almost any non-stationary process is going to be rather large, but that's not what most people care about. The Vigen examples are hilarious (especially the ones on his website), but they are all attributable to the misuse of correlations on time series. More interesting ("significant") are examples that appear to meet the assumptions of computing a bivariate correlation but are due to lurking variables. I'm sure there are some old examples of this sort that people here would know.
    Best,
    Lee

    ------------------------------
    Leland Wilkinson
    H2O
    ------------------------------



  • 11.  RE: early examples of spurious correlation

    Posted 05-17-2017 14:29

    I found the following in an intro statistics text, but I no longer remember which one:


    A male college student goes out drinking one night and drinks only scotch and water, over does it, and wakes up with a headache and hangover. He goes out dinking the next night but switches to Canadian whiskey and water and wakes up with headache and hangover. Undaunted, he goes out drinking a third night, drinks only Kentucky bourbon and water and wakes up with a headache and hangover. He then concludes that the water is causing the headaches and hangovers.







  • 12.  RE: early examples of spurious correlation

    Posted 05-18-2017 05:17
    This example can be found in Darrell Huff's magnificent little book "How to Lie With Statistics", 31st printing 1954:

    "The death rate in the Navy during the Spanish-American War was nine per thousand. For civilians in New York City during the same period it was sixteen per thousand. Navy recruiters later used these figures to show that it was safer to be in the Navy than out of it."

    If one doesn't think too deeply about the details of how these values arose then enlisting in the Navy seems to be quite a reasonable choice!





  • 13.  RE: early examples of spurious correlation

    Posted 05-18-2017 22:40
    One of my favorite lurking variable spurious correlations: Children's shoe size and their score on a standardized reading test.

    ------------------------------
    Martha Smith
    University of Texas
    ------------------------------



  • 14.  RE: early examples of spurious correlation

    Posted 05-17-2017 14:42
    I know someone mentioned  Tyler Vigen's book, but here's a quick look at many on his website (shout out and thanks to my CSP mentee, Naomi Brownstein, who first directed it to me!).
    15 Insane Things That Correlate With Each Other



    ------------------------------
    Mary Kwasny
    Associate Professor
    ------------------------------



  • 15.  RE: early examples of spurious correlation

    Posted 05-18-2017 17:51

    A long time ago, I heard of one that caused a good many chuckles: in 18th Century New England, there was a high correlation between protestant ministers' income and the consumption of rum from the Caribbean. Of course, they both came from the level of economic prosperity. Sorry I don't recall the coefficient number.

    --Bob Riffenburgh