Skip main navigation (Press Enter).

Discussion: View Thread

Back to discussions

Expand all | Collapse all

Origins of current p-value discussion

Georgette Asherman08-04-2015 10:06

It is amazing how the Puritan streak in American culture starts to fall into all everything, including ...

James Garrett07-16-2015 14:19

Georgette's mention of "statistically significant" calls another point to mind: whenever possible I ...

Joe Swintek07-16-2015 18:00

Throughout my education as a scientist and as a statistician, I've never heard the phase "statistically ...

Susan Spruill07-17-2015 11:44

Joe, I like to discuss whether or not results are "clinically meaningful" or "biologically relevant". ...

Peter Flom07-17-2015 13:35

First off, the problems with p-values long predate Gelman - I like Gelman a lot, but this goes back ...

James Dobbins07-17-2015 13:54

I don't see a problem with p values but have not read the material that all the ruckus is about. ...

Rod Elsdon07-23-2015 11:58

Hello Colleagues: We Statisticians understand that a "P-Value" is actually a type of conditional ...

Jason Machan07-23-2015 13:35

Rod, I really like this suggestion. I think it is insightful and I like that it places the ...

Charles Coleman07-23-2015 14:33

I remember a JSM talk in which the speaker spoke about using False Positives and False Negatives ...

Charles Coleman07-23-2015 14:38

A correction: I left out the conditioning event. The p-value is the probability of detecting ...

John Dawson07-23-2015 14:55

The benefits of p-values accrue from ready, widespread knowledge of a common statistical concept. ...

Dalton Hance07-23-2015 19:16

Perhaps Statistics only has itself to blame? As John points the p-value has somehow become ...

John Dawson07-23-2015 20:52

If statistics is the religion we're sold to the masses, then our current p-value practices ...

R. Latta07-24-2015 06:41

John, you have hit the nail precisely on its head. My graduate work in statistics was done ...

Knut Wittkowski07-24-2015 11:40

I agree. In 1956 (Statistical Methods and Scientific Inference), Fisher clarified: No ...

Virginia Recta07-24-2015 07:13

The theme of JSM 2016 is “The Extraordinary Power of Statistics.” Can this current discussion ...

Michiko Wolcott07-24-2015 19:58

I like the idea. We'll keep this in mind when we meet for the roundtable! ------------------------------ ...

Nelson Lipshutz07-23-2015 18:04

It wouldn't be a bad idea to finally establish a consistent p-value convention. Half the time. ...

Peter Flom07-23-2015 18:08

I have never seen anyone use a high p-value to denote a high probability of a real effect. ...

David Mangen07-23-2015 19:15

I certainly have not seen it used half the time, but with some (poorly trained market researchers, ...

Knut Wittkowski07-24-2015 11:33

I think this is just a matter of language. A small p-value (or a high -log(p) ) is an indicator ...

1. Origins of current p-value discussion

Recommend
Georgette Asherman
Posted 08-04-2015 10:06
It is amazing how the Puritan streak in American culture starts to fall into all everything, including science publications. The most recent concern about p-values comes mostly from Andrew Gelman regarding social science research with small effects and large variances. He pointed out and named a few researchers who drew conclusions based on drilling through tests to find a two-sided significant p-value that 1) was probably spurious and 2) might go in the wrong direction. And previous researchers such as Kenneth Rothman pointed out that adding confidence intervals show the range of variability, not just whether the interval includes 0. So a major journal declares p-values 'evil' and this spreads to journals with broader missions. It is like hysterical bans on sunshine or carbs or Barbie dolls--an item specific to circumstances and good judgement is declared taboo.

There is a difference between a research study and operational decision making. There are times when a decision has to be made- whether to change a formulation, investigate a fraud case or release a product to humans. Cut-offs of any kind are a problem but for the most part well-designed hypothesis testing serves a purpose. However something has gone wrong in our communications because too many many biologists answer 'What effect size are you looking for?' with 'a significant one.'

I will not be at JSM this year but I hope your round table is successful.

Best,

Georgette Asherman

Direct Effects, LLC
2. RE: Origins of current p-value discussion

Recommend
James Garrett
Posted 07-16-2015 14:19
Georgette's mention of "statistically significant" calls another point to mind: whenever possible I use the term "statistically detectable" in place of "statistically significant." It conveys what the hypothesis test outcome means: the data indicate the nonzero effect is not spurious. And nothing more than that.

However, when communicating outside of our organization, I feel compelled to say "statistically significant" because it is the convention, and deviating from convention in communications can lead to confusion. If the statistics community organizes to address issues of research irreproducibility, is there any chance we could also push for a replacement for "statistically significant?"

I also will not be attending JSM.

------------------------------
James Garrett
Sr. Assoc. Dir. of Biostatistics
Novartis
------------------------------

Original Message
3. RE: Origins of current p-value discussion

Recommend
Joe Swintek
Posted 07-16-2015 18:00
Throughout my education as a scientist and as a statistician, I've never heard the phase "statistically significant", not once. It was not until I started working in a biology lab that even heard the phase, "statistically significant". It comes from two parts; one is from the column labeled “significance” that is to the right of the p-value in the most outputs of statistical software, and the other comes from the need to distinguish, “yes the null hypothesis can be reject” (statistically significance) from, “yes this difference actually matters for the organism (or population)” which is called biological significance. I was taught to, and much prefer to talk about degrees of evidence. Using phases like, “little to no evidence”, “weak evidence”, and “strong evidence”. I will be attending JSM this year and I am hoping I can participate in the round table.

------------------------------
Joe Swintek
Statistician
Badger Technical Services
------------------------------

Original Message
4. RE: Origins of current p-value discussion

Recommend
Susan Spruill
Posted 07-17-2015 11:44
Joe,

I like to discuss whether or not results are "clinically meaningful" or "biologically relevant". I'm with you on the overuse of "significant". But it does not solve the problem of non-statisticians misusing p-values or their desire to have a singular number that tells them they have something publishable. I also like to ask my scientific colleagues if they can tell a compelling story around their findings. This usually make them do more literature research to answer the questions "is this finding clinically relevant?"

------------------------------
Susan Spruill
Statistical Consultant
------------------------------

Original Message
5. RE: Origins of current p-value discussion

Recommend
Peter Flom
Posted 07-17-2015 13:35
First off, the problems with p-values long predate Gelman - I like Gelman a lot, but this goes back at least to Meehl - see this article and probably before.

Second, are we sure that banning p values is bad? My favorite professor in graduate school (Herman Friedman) used to say "Stop p-ing on the research!" There may be cases where p = XXXX provides useful information, but they are rare.

------------------------------
Peter Flom
------------------------------

Original Message
6. RE: Origins of current p-value discussion

Recommend
James Dobbins
Posted 07-17-2015 13:54
I don't see a problem with p values but have not read the material that all the ruckus is about.

To me

it means that from your data you produce some "test statistic" s and s has a distribution S. You

set a significance level a (say a = .01) and do an experiement or study or whatever with one side rejection region

and this produces a value s* from S with p value for s*. If p =.006 then this means that the probability of

seeing an s like this under the null hypothesis is .006 and since .006 less than .01 we reject the null hypothesis

and say that according to our test procedure we are at leat 99% confident that the null hypothesis in false.

Tell me how I can improve or if you see a flaw in this. This seems like a good tool when used like this.

Thanks,

James Gregory (Greg) Dobbins

Original Message
7. RE: Origins of current p-value discussion

Recommend
Rod Elsdon
Posted 07-23-2015 11:58
Hello Colleagues:

We Statisticians understand that a "P-Value" is actually a type of conditional probability (the condition being that we assume the null hypothesis is true).

If the P-Value is "small", then we change our minds and reject the null hypothesis and accept the alternative.

Here is an idea when it comes to communicating with non-statisticians:

To reduce the odds of confusing non-statisticians, perhaps we should use the phrase "Probability-Value" instead of P-Value.

Believe it or not, there are many out there that do not understand that a P-Value is actually a conditional probability.

Rod Elsdon

Chaffey College

Original Message
8. RE: Origins of current p-value discussion

Recommend
Jason Machan
Posted 07-23-2015 13:35
Rod,

I really like this suggestion. I think it is insightful and I like that it places the burden on us.

The phrases "p-value" and "[statistically] significant" have become too short semantic shortcuts. BUT, it's not the calculations that are the problem, it's "illusion that communication has occurred" (to paraphrase GB Shaw) when they sit in a table or in text or as we're holding a conversation.

Physicians can't help their patients if the patients don't understand questions enough to give appropriate feedback…and it's the physician's job to make sure that happens. The longer they're in practice, the more developed this soft-skill.

I feel similarly about our role. One of the most important parts of our job is to make sure our collaborators understand what we're really doing enough to help us gut-check how our logic and interpretations regarding their data. I'm not talking about the summation of y-sub-i-sub-j stuff…(unless they want to)…but enough of the conceptuals to know our gears are meshing.

Often as I'm going through a set of results with someone I feel would be receptive, I find myself guiding them through the question like this:

"What are the chances…even though we've split them up into two buckets based on gender…that with regards to this particular outcome, they're really just part of a single bucket? That probability, the p-value, is 0.20. That is, if we were to have randomly placed patients like these into 2 buckets a bazillion times, we'd have have gotten a difference of this big or bigger in about 20% of them…so, not quite rare enough for us to call it statistically significant."

It also, I hope gives them enough of a glimpse into what we're doing conceptually to challenge the hardline of alpha=0.05.

My 2 cents!

J

"A good decision is based on knowledge and not on numbers." - Plato

Jason T. Machan, Ph.D.
Director, Lifespan Biostatistics Core,
     Lifespan Hospital System
Research Scientist, Biostatistics, Research
     Rhode Island Hospital
Associate Professor, Departments of Orthopaedics and Surgery
     The Warren Alpert Medical School, Brown University
Director Biostatistics Externship, Adjunct Associate Professor, Department of Psychology
     University of Rhode Island

Grads Dorm 206a (click for map)
593 Eddy Street
Providence, RI, 02903
office: 401-444-1493
cell: 401-639-3942
fax: 401-444-8271
Request biostatistical assistance: http://www.LifespanBiostatisticsCore.org
Request research technology assistance: Click here for special research technology support
Lifespan Biostatistics REDCap (incl. new user requests): http://www.LifespanREDCap.org

CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential, proprietary, and/or privileged information protected by law. If you are not the intended recipient, you may not use, copy, or distribute this e-mail message or its attachments. If you believe you have received this e-mail message in error, please contact the sender by reply e-mail and destroy all copies of the original message.

Original Message
9. RE: Origins of current p-value discussion

Recommend
Charles Coleman
Posted 07-23-2015 14:33
I remember a JSM talk in which the speaker spoke about using False Positives and False Negatives to replace the Type I/Type II terminology. In these terms, the p-value is the probability of a False Positive. That's relatively easy for nonstatisticians to grasp.

Chuck Coleman

Original Message
10. RE: Origins of current p-value discussion

Recommend
Charles Coleman
Posted 07-23-2015 14:38
A correction: I left out the conditioning event. The p-value is the probability of detecting a False Positive when the effect does not exist.

Chuck Coleman

Original Message
11. RE: Origins of current p-value discussion

Recommend
John Dawson
Posted 07-23-2015 14:55
The benefits of p-values accrue from ready, widespread knowledge of a common statistical concept.

The shortcomings of p-values are shortcomings in the communication of more specific information.

As statisticians our job is to be precise but understandable, and this is an art as much as a science:

"We have left undone those things which we ought to have done;

And we have done those things that we ought not to have done"

- vs -

"Type I and II errors were committed"

------------------------------
John Dawson
Assistant Professor
Texas Tech University
------------------------------

Original Message
12. RE: Origins of current p-value discussion

Recommend
Dalton Hance
Posted 07-23-2015 19:16
Perhaps Statistics only has itself to blame? As John points the p-value has somehow become ubiquitous. It has become the de facto requirement for a discovery or theory to be considered believable. In someways this is a victory for Statistics in that we have convinced almost every scientific field that a pretty theory is not enough; theory must be supported by data and that data must have enough weight to be considered plausible. Of course we know the p-value is only one part of a practice. And that without a rigorous design that controls for confounding variables and spurious correlations, or a clearly defined scope of inference or a standard of ethics that recognizes the dangers in data snooping and multiple comparison, the p-value is meaningless.

A metaphor: it's as if we've sold the world on religion, but the masses missed all that stuff the importance of deep contemplation and spiritual inquiry and only take this message away on Sunday: "It doesn't matter what you do, so long as you ask for forgiveness for your sins."

------------------------------
Dalton Hance
Environmental Statistician

Anchor QEA LLC
------------------------------

Original Message
13. RE: Origins of current p-value discussion

Recommend
John Dawson
Posted 07-23-2015 20:52
If statistics is the religion we're sold to the masses, then our current p-value practices do not reflect original dogma. Rather, they are a corruption of the teachings of our patron saint:

Personally, the writer prefers to set a low standard of significance at the 5 percent point … A scientific fact should be regarded as experimentally established only if a properly designed experiment rarely fails to give this level of significance
- R. A. Fisher, Statistical Methods for Research Workers, 1926

One instance of p < 0.05 was never intended to be the final word in any investigation.

------------------------------
John Dawson
Assistant Professor
Texas Tech University
------------------------------

Original Message
14. RE: Origins of current p-value discussion

Recommend
R. Latta
Posted 07-24-2015 06:41
John, you have hit the nail precisely on its head. My graduate work in statistics was done at Iowa State using a later edition of a Fisher' text. The magic words are 'only,' 'rarely' and 'experiment' in my estimation. The use of p values found once and/or not in controlled experiments is misleading. The lack of replicable results seen in the literature is well documented.

Further isues arise when claims of 'scientific consensus' are made about complex model results as substitutes for data and experimental results. When those models don't predict actual data, the data definition is changed rather than conclude the model is wrong. Fisher is probably rolling over in his grave at the extravagant claims made about global warming.
------------------------------
R. Latta
Executive Director
YTMBA Research & Consulting
------------------------------

Original Message
15. RE: Origins of current p-value discussion

Recommend
Knut Wittkowski
Posted 07-24-2015 11:40
I agree. In 1956 (Statistical Methods and Scientific Inference), Fisher clarified:

No scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas.

------------------------------
Knut Wittkowski
Head, Dept. Biostatistics, Epidemiology, and Research Design
Rockefeller University
------------------------------

Original Message
16. RE: Origins of current p-value discussion

Recommend
Virginia Recta
Posted 07-24-2015 07:13
The theme of JSM 2016 is “The Extraordinary Power of Statistics.”

Can this current discussion about the use and abuse of the p-value be a session "With great power comes great responsibility" ;-) I know Spiderman said this, but he can't have been the first to say it, or the last.

------------------------------
Jean Recta
Mathematical Statistician
FDA/Center for Veterinary Medicine
------------------------------

Original Message
17. RE: Origins of current p-value discussion

Recommend
Michiko Wolcott
Posted 07-24-2015 19:58
I like the idea. We'll keep this in mind when we meet for the roundtable!

------------------------------
Michiko Wolcott
Principal Consultant
Msight Analytics
------------------------------

Original Message
18. RE: Origins of current p-value discussion

Recommend
Nelson Lipshutz
Posted 07-23-2015 18:04
It wouldn't be a bad idea to finally establish a consistent p-value convention. Half the time. statisticians announce a low p-value denotes a high probability of a real effect; half the time, they announce a high p-value to denote a high probability of a real effect. We really need to stop using the same term to denote p and 1-p when we discuss p-values.

------------------------------
Nelson Lipshutz
Regulatory Research Corp.
------------------------------

Original Message
19. RE: Origins of current p-value discussion

Recommend
Peter Flom
Posted 07-23-2015 18:08
I have never seen anyone use a high p-value to denote a high probability of a real effect. Not once.

If anyone actually does that, it would, of course, lead to huge confusion.

------------------------------
Peter Flom
------------------------------

Original Message
20. RE: Origins of current p-value discussion

Recommend
David Mangen
Posted 07-23-2015 19:15
I certainly have not seen it used half the time, but with some (poorly trained market researchers, in particular) you will see them tout that "this finding is highly significant [95% confidence]" or some such statement. I don't think that they believe that a low number can be "good" -- or perhaps they doubt that their clients will believe it.

------------------------------
David Mangen
------------------------------

Original Message
21. RE: Origins of current p-value discussion

Recommend
Knut Wittkowski
Posted 07-24-2015 11:33
I think this is just a matter of language. A small p-value (or a high -log(p) ) is an indicator of high significance, though neither "a high probability of a real effect" (unless this is used in a very vague, non-quantitative way) nor "a low probability of a false positive" (which would depend on the prior probability for a true positives), but a low prevalence of a result at least as significant under the null hypothesis.

------------------------------
Knut Wittkowski
Head, Dept. Biostatistics, Epidemiology, and Research Design
Rockefeller University
------------------------------

Original Message

Powered by Higher Logic

Global message icon