ASA Connect

 View Only
Expand all | Collapse all

p-values: a fable

  • 1.  p-values: a fable

    Posted 04-09-2017 14:14

    Once upon a time, there was a brilliant scientist who invented a way to tell the difference between signal and noise in messages from data.  His method was not perfect, but it worked well provided the message had been carefully crafted in advance to fit a particular format.

     

    Unfortunately, because the method could be purged of ideas and reduced to a single number, well-meaning journal editors were led to declare, "Our journal will publish only signal, never noise."

     

    Understandably, well-meaning authors, under pressure to publish, were motivated to make their messages look to editors like signal, never noise.

     

    Reality intruded.  Because many messages had not been carefully crafted, their content was mostly noise.

     

    Thus was born the industry of trying to make noise look like signal.  (This last is a lie, of course, or at least an anachronism.  The industry of making noise look like signal was invented by politicians thousands of years before Fisher.  Statisticians merely fortified the hoax by attaching numbers.)

     

    The moral:  Teaching the p-value as an abstract mathematical construct stripped of its context mainly contributes to the noise in the channel of science.  Detecting the signal requires thought.  Study design matters.


    George Cobb



  • 2.  RE: p-values: a fable

    Posted 04-10-2017 07:09

    Well said.  It strikes me that there is much in  the recent assault on p-values that is akin to throwing the baby out with the bath water.  I used to use the analogy of morphine.  Just because morphine can be abused doesn't mean it is always abused.  Now, unfortunately, opioids have become like p-values.






  • 3.  RE: p-values: a fable

    Posted 04-11-2017 12:56
    Historically, I think the problem has been 1) that statisticians and others have blamed p-values and significance assessment themselves for their misuse and misinterpretation by those who did not understand them and their limitations, and 2) that most statisticians and other scientists have continued to use fixed alphas and the 'significant/non-significant' jargon long after the illogicality of those practices was clear to anyone who delved into the literature at all. I'd be interested to know of any flaws in my and a colleague's review of a few years ago.  See:

    Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the NeoFisherian

    by Stuart H. Hurlbert and Celia M. Lombardi

    http://www.sekj.org/PDF/anz46-free/anz46-311.pdf

    ABSTRACT:  This essay grew out of an examination of one-tailed significance testing. One-tailed tests were little advocated by the founders of modern statistics but are widely used and recommended nowadays in the biological, behavioral and social sciences. The high frequency of their use in ecology and animal behavior and their logical indefensibility have been documented in a companion review paper. In the present one we trace the roots of this problem and counter some attacks on significance testing in general. Roots include: the early but irrational dichotomization of the P scale and adoption of the 'significant/non- significant' terminology; the mistaken notion that a high P value is evidence favoring the null hypothesis over the alternative hypothesis; and confusion over the distinction between statistical and research hypotheses. Resultant widespread misuse and misinterpretation of significance tests have also led to other problems, such as unjustifiable demands that reporting of P values be disallowed or greatly reduced and that reporting of confidence intervals and standardized effect sizes be required in their place. Our analysis of these matters thus leads us to a recommendation that for standard types of significance assessment the paleoFisherian and Neyman-Pearsonian paradigms be replaced by a neoFisherian one. The essence of the latter is that a critical α (probability of type I error) is not specified, the terms 'significant' and 'non-significant' are abandoned, that high P values lead only to suspended judgments, and that the so-called "three-valued logic" of Cox, Kaiser, Tukey, Tryon and Harris is adopted explicitly. Confidence intervals and bands, power analyses, and severity curves remains useful adjuncts in particular situations. Analyses conducted under this paradigm we term neoFisherian significance assessments (NFSA). Their role is assessment of the existence, sign and magnitude of statistical effects. The common label of null hypothesis significance tests (NHST) is retained for paleoFisherian and Neyman-Pearsonian approaches and their hybrids. The original Neyman-Pearson framework has no utility outside quality control type applications. Some advocates of Bayesian, likelihood and information-theoretic approaches to model selection have argued that P values and NFSAs are of little or no value, but those arguments do not withstand critical review. Champions of Bayesian methods in particular continue to overstate their value and relevance.

     



    ------------------------------
    Stuart Hurlbert
    Emeritus Professor of Biology
    San Diego State University
    ------------------------------



  • 4.  RE: p-values: a fable

    Posted 04-10-2017 08:31

    I.LOVE.THIS!

     

    You have boiled down the ever-present p-value issue like a Hemingway novel.  Short and to the point, but eloquent.

     

    May I call you Papa Cobb?

     

    Susan E. Spruill

    Susan E. Spruill, PStat®

    Statistical Consultant, President

    Applied Statistics and Consulting

    828-467-9184 (phone)

    Professional Statistician accredited by the American Statistical Association

    www.appstatsconsulting.com

     






  • 5.  RE: p-values: a fable

    Posted 04-10-2017 08:35
    George says, "The moral:  Teaching the p-value as an abstract mathematical construct stripped of its context mainly contributes to the noise in the channel of science.  Detecting the signal requires thought.  Study design matters."

    To this I would answer, correct, study design matters, BUT presenting our results in a measure that allows easy incorporation into a decision theoretic framework matters too.  No matter how good the study design is, a p-value is not useful to me if I can't blend it with a utility function that leads to a decision.  What we compute and distribute as a summary measure of our evidence matters too.


    ------------------------------
    Dalene Stangl
    Professor of the Practice
    Duke University
    ------------------------------



  • 6.  RE: p-values: a fable

    Posted 04-11-2017 12:03

    I have followed the discussion and related articles about p-values attentively.  What is worrisome is that p-values, a value derived from some test, are viewed in a vacuum, not as part of the decision making process.  Using p-values for decision-making requires a risk assessment.  What risk is the decision maker willing to accept (alpha, beta) and the practical significance (delta). I can perfectly understand the need for an alpha value different from the traditional 5%. 

    Interesting is that practical significance is often not mentioned. Given enough data, any difference can become statistically significant without being practically significant. From my perspective, the p-value is based on data, an objective value (when making abstraction of measurement noise, model correctness, and right-sizing of the test) and the selected significance level, a subjective value. The risks selected by the decision maker are subjective values also. The selected practical significance directly affects the p-value. Therefore, in absence of a decision-maker-selected significance level, comparing a p-value with the chosen alpha value becomes meaningless. In essence, the comparison will be based on statistical significance with a delta equal to zero, which may be of no practical significance.  I believe that many comments about the inappropriateness of p-values should be reviewed while considering practical significance and at least alpha risk.

    It seems that the discussion about hypothesis testing and p-values follows a mechanical, black box approach, ignoring the role of the decision maker.  I believe that when the analyst has an in-depth discussion with the decision maker about significance and risk, using p-values is perfectly acceptable.







  • 7.  RE: p-values: a fable

    Posted 04-11-2017 12:32
    Dalene, it seems to me in real life applications, e.g. in clinical trials, formal decision theoretic frameworks and formal utility functions are never, in truth, used to make decisions about drug approvals, marketing, etc. as the decisions always will be based on the subjective weighing of many different pieces of information concerning major and minor benefits, major and minor negative side effects, magnitude of all measured effects, uncertainty about long term effects, external validity (e.g. generalizability of results to classes of patients to included in trials), etc. The pretense that the ultimate decisions can be truly objective and unaffected by both values and subjective assumptions seems to be just that, a pretense.

    ------------------------------
    Stuart Hurlbert
    Emeritus Professor of Biology
    San Diego State University
    ------------------------------



  • 8.  RE: p-values: a fable

    Posted 04-10-2017 10:30
    Well said.

    Sent from my iPhone




  • 9.  RE: p-values: a fable

    Posted 04-11-2017 10:45
    It one p-value is good, more should be better. Why not hundreds of them? Then publish those that are <0.05. Why not thousands? I've just counted out ~50 papers. Across the papers, the median number of p-values possible is about 10,000.

    ------------------------------
    Sidney Young
    Retired
    ------------------------------



  • 10.  RE: p-values: a fable

    Posted 04-16-2017 18:26
    When doing things like ANOVA or regression, if an effect has a p-values that is very close to one, it's often trying to tell us that there is something going on that we don't know about.

    ------------------------------
    Emil M Friedman, PhD
    emilfriedman@gmail.com
    http://www.statisticalconsulting.org
    ------------------------------



  • 11.  RE: p-values: a fable

    Posted 04-17-2017 07:31
    Recently, the noise of life has interfered with the signal of statistics but I'm thrilled that this particular signal has endured long enough for me to catch it.  

    George, this is brilliant.

    Susan Spruill, in her reply, compared your writing to Hemingway prose - economical and understated.  This characteristic gives to your fable added utility as an approachable and enjoyable way to explain p-values to my colleagues who are not statistically inclined.

    Thanks for sharing your insight.

    Linda

    ------------------------------
    Linda A. Landon, PhD, ELS

    Research Consultant

    PhD, Molecular Pharmacology
    Graduate Certificate, Applied Statistics
    Board-Certified Editor in the Life Sciences

    Research Communiqué
    Clear, Concise Statistics & Words
    LandonPhD@ResearchCommunique.com
    573-797-4517
    ------------------------------