I don't see a problem with p values but have not read the material that all the ruckus is about.
To me
it means that from your data you produce some "test statistic" s and s has a distribution S. You
set a significance level a (say a = .01) and do an experiement or study or whatever with one side rejection region
and this produces a value s* from S with p value for s*. If p =.006 then this means that the probability of
seeing an s like this under the null hypothesis is .006 and since .006 less than .01 we reject the null hypothesis
and say that according to our test procedure we are at leat 99% confident that the null hypothesis in false.
Tell me how I can improve or if you see a flaw in this. This seems like a good tool when used like this.
Thanks,
James Gregory (Greg) Dobbins
Original Message------
First off, the problems with p-values long predate Gelman - I like Gelman a lot, but this goes back at least to Meehl - see this article and probably before.
Second, are we sure that banning p values is bad? My favorite professor in graduate school (Herman Friedman) used to say "Stop p-ing on the research!" There may be cases where p = XXXX provides useful information, but they are rare.
------------------------------
Peter Flom
------------------------------
Original Message:
Sent: 07-17-2015 11:44
From: Susan Spruill
Subject: Origins of current p-value discussion
Joe,
I like to discuss whether or not results are "clinically meaningful" or "biologically relevant". I'm with you on the overuse of "significant". But it does not solve the problem of non-statisticians misusing p-values or their desire to have a singular number that tells them they have something publishable. I also like to ask my scientific colleagues if they can tell a compelling story around their findings. This usually make them do more literature research to answer the questions "is this finding clinically relevant?"
------------------------------
Susan Spruill
Statistical Consultant
------------------------------
Original Message:
Sent: 07-16-2015 17:59
From: Joe Swintek
Subject: Origins of current p-value discussion
Throughout my education as a scientist and as a statistician, I've never heard the phase "statistically significant", not once. It was not until I started working in a biology lab that even heard the phase, "statistically significant". It comes from two parts; one is from the column labeled “significance” that is to the right of the p-value in the most outputs of statistical software, and the other comes from the need to distinguish, “yes the null hypothesis can be reject” (statistically significance) from, “yes this difference actually matters for the organism (or population)” which is called biological significance. I was taught to, and much prefer to talk about degrees of evidence. Using phases like, “little to no evidence”, “weak evidence”, and “strong evidence”. I will be attending JSM this year and I am hoping I can participate in the round table.
------------------------------
Joe Swintek
Statistician
Badger Technical Services
------------------------------
Original Message:
Sent: 07-16-2015 14:18
From: James Garrett
Subject: Origins of current p-value discussion
Georgette's mention of "statistically significant" calls another point to mind: whenever possible I use the term "statistically detectable" in place of "statistically significant." It conveys what the hypothesis test outcome means: the data indicate the nonzero effect is not spurious. And nothing more than that.
However, when communicating outside of our organization, I feel compelled to say "statistically significant" because it is the convention, and deviating from convention in communications can lead to confusion. If the statistics community organizes to address issues of research irreproducibility, is there any chance we could also push for a replacement for "statistically significant?"
I also will not be attending JSM.
------------------------------
James Garrett
Sr. Assoc. Dir. of Biostatistics
Novartis
------------------------------
Original Message:
Sent: 07-16-2015 13:11
From: Georgette Asherman
Subject: Origins of current p-value discussion
It is amazing how the Puritan streak in American culture starts to fall into all everything, including science publications. The most recent concern about p-values comes mostly from Andrew Gelman regarding social science research with small effects and large variances. He pointed out and named a few researchers who drew conclusions based on drilling through tests to find a two-sided significant p-value that 1) was probably spurious and 2) might go in the wrong direction. And previous researchers such as Kenneth Rothman pointed out that adding confidence intervals show the range of variability, not just whether the interval includes 0. So a major journal declares p-values 'evil' and this spreads to journals with broader missions. It is like hysterical bans on sunshine or carbs or Barbie dolls--an item specific to circumstances and good judgement is declared taboo.
There is a difference between a research study and operational decision making. There are times when a decision has to be made- whether to change a formulation, investigate a fraud case or release a product to humans. Cut-offs of any kind are a problem but for the most part well-designed hypothesis testing serves a purpose. However something has gone wrong in our communications because too many many biologists answer 'What effect size are you looking for?' with 'a significant one.'
I will not be at JSM this year but I hope your round table is successful.
Best,
Georgette Asherman
Direct Effects, LLC