William, I meant to state that the p-value p does not measure evidence ONLY IF one adopts the bayesian / evidentialist interpretation of evidence. It's meaningless to say that p does or does not measure evidence, because frequentism (in which p is generated) has no concept of evidence. Thus, p might measure evidence as I conceive of evidence, but not as you conceive of evidence. Or vice versa.
If we're identifying problems with p, maybe a good one to point out is that everybody argues about whether p measures evidence, but most of the time we don't say what we mean, so our talk is meaningless. I don't see your JSM paper addressing this Qn, either.
I agree that the discrepancy between estimation (using CIs) & testing is a problem. Indeed, it's strange that, in calculating p for 1 sided testing (say, Ho: mu<=0) we don't assume Ho but rather we assume the boundary between Ho & the alternative hypothesis Ha. This practice arises because, if we were to assume Ho itself, we'd need to calculate p by integrating over the parameter space. Whereas that type of integration is not allowed in frequentism, we are left with calculating p assuming a single point in Ho, & that point is the boundary. We choose the boundary because that's the one that maximizes p. In other words, if p (calculated at the boundary) <a for some a, then p (assuming any other point in Ho) is also <a.
I hope to have a closer look at your JSM paper. Do you find it interesting that, after 100+ years, people still can't agree on the meaning of p, & yet most statisticians still keep using it? My opinions about the reasons for that appear in a book "Christian and Humanist Foundations for Statistical Inference."
In any case, thanks for reminding everyone (as does SN Goodman!) the difference between hypothesis testing & significance testing; the common conflation of the 2 also generates much confusion.
-------------------------------------------
Andrew Hartley
Associate Statistical Science Director
PPD, Inc.
-------------------------------------------
Original Message:
Sent: 11-17-2014 10:50
From: William Goodman
Subject: significant
Andrew is right that the p-value is not, directly a measure of evidence for the truth of H0, but based on simulations that can be conducted, I'd suggest that the p=value's main problem is not that issue; because under certain conditions, one could estimate an evidence measure from the p-value.
I'd suggest that this is the bigger problem: "...Why, when estimating a parameter, we provide a range (a confidence interval), but when testing a hypothesis about a parameter (e.g. μ = x) we proceed as if "=" entails exact equality of the parameter with x? That ...is not the standard expected for power calculations, where we are satisfied to reject H0 if the result is merely "detectably" different from (exact) H0.". In practical terms H0's "thickness" matters a lot.
In each simulated case (of thousands) in an experiment, (a) a sample was drawn from a population with a known "true" mean value (the simulated true mean changed for each case ), and (b) based on the sample, the null hypothesis was tested, conventionally, that the true mean = 100; then (c) data were recorded for the p-value for that test,and the "actual" distance between the "true" mean and that sample's mean and (in experiments where these were varied), the simulated value for sigma, etc.
Based on the data, here' a rule of thumb I've found to quite robust to changes in details in the experiment. (I'd love to see someone replicate or improve it.):
"If the effect size is not at least as large as the specified H0 thickness (e.g. the detectable distance you'd use when calculating sample size), or, preferably, a bit larger, then the best guess is to stick with H0 as likely true (in the 'thick' sense of 'true')-regardless of what p-value you obtain. If, on the other hand, the effect size is quite a bit larger than the H0 thickness, then rejecting H0 is a safer-even if the p-value is on that occasion not that persuasive."
Here's where I'm quoting from: http://www.statlit.org/pdf/2010GoodmanASA.pdf
Best regards,
Bill
-------------------------------------------
William Goodman
University of Ontario Institute of Technology
-------------------------------------------
Original Message:
Sent: 11-14-2014 09:01
From: Andrew Hartley
Subject: significant
Joseph, I like your first suggestion, as it points to the meaninglessness of statistical significance.
However, I wonder if your 2nd suggestion is misleading, since it seems to assume some definition of "evidence." "Evidence" is defined within the bayesian world (see writings from Richard Royall, Steven Goodman, Veronica Vieland,...), & the p-value does NOT measure that type of evidence. In the frequentist world, though, "evidence" is not a defined concept.
-------------------------------------------
Andrew Hartley
Associate Statistical Science Director
PPD, Inc.
-------------------------------------------
Original Message:
Sent: 11-13-2014 07:40
From: Joseph Nolan
Subject: significant
I am tired of seeing students write the words "statistically significant" too. However I would propose that they simply not be allowed to use those words, which in my opinion just encourage them to leave out the context. Instead, I encourage them to write something like this:
Based on the p-value of 0.0027, we find evidence that the average number of donuts eaten on a weekly basis is larger for statisticians than it is for biologists.
OR
Since our sample resulted in a p-value of 0.38, we lack evidence that the average weekly rainfall in New York City is associated to the number of cars crossing into Manhattan.
Cheers,
Joe
-------------------------------------------
Joseph Nolan
Associate Professor of Statistics
Director, Burkardt Consulting Center
NKU Department of Mathematics & Statistics
-------------------------------------------
Original Message:
Sent: 11-12-2014 14:03
From: Robert Lehr
Subject: significant
I'm tired of saying or writing the adverb, adjective pair "statistically significant".
I would propose the word "staticant" to replace those clumsy nine syllables.
stat-i-cant (stăt-ĭ-kănt) adj. 1. statistically significant