How about we make P-values reflect the probability of someone getting different results?
Suppose that we have a t-test where T-value = Difference/Std Err. If we have a difference of 6.00 and a Std Err of 2.00, we get a T-value of 3.00. If the T-critical value is 2.00, we claim; "We reject the null hypothesis." We can create a confidence interval of 6.00 +/- T-crit*Std Err => 6.00 +/- 2.00*2.00 => (2.00, 10.00). P-Value of test < 0.05.
If someone else does a replicate experiment, if they find that the Difference < T-crit*Std Err, they will fail to reject the null hypothesis. Based upon the data we have, any time someone finds a difference less than 4.00, they get different results. 4.00 to 2.00 are all within our original confidence interval. There is about a 15% chance someone will fail to replicate our results! How does a 15% chance of failing to replicate the results lead to a P-value of less than 0.05?
Furthermore, if we look at a lot of the assumptions we use in our tests, there is a lot of evidence telling us our assumptions are WRONG! But we don't listen. Consider for a moment our assumption that we can use a sample that is "representative" of the true population. How often is this assumption true? As best I can tell, not often enough! Think about all the psychology experiments that are not repeatable. Think of all the clinical trials that "look promising" in a Phase 2 trial and fail miserably in Phase 3? If the sample used in Phase 2 was representative of the population, then Phase 3 should bolster our claims, not diminish them.
If we look at something like bootstrapping, we assume the data we already have is representative of the population. Then we resample from this small sample. If the sample is not representative, then bootstrapping is a pointless exercise. A better idea would be to use the data we have and generate 100's to 1,000's of random values and use the mean and standard deviation of our sample data. That way, we keep the same variability our original sample had.
For those of us that use regression models, how about we start using R^2(predicted) and Confusion Matrices as part of our diagnostics? I've seen several "textbook" data sets where the model "looks good" but fails miserably in predicting the event outcome. The first data set I noticed this on, was David Kleinbaum's "Evan's County Data". About 10% of the people in the study data had CHD. Using the model David created accurately predicts 6-12 of the 60+ cases. A confusion matrix let's you know right away that the model isn't very good for predicting who has CHD. But, since that isn't a criteria for the model, and software for logistic regressions don't offer the option, it's up to the statistician to go forth and check things out. What is sad, is that David did everything right, based on what we learn in stats classes. He's not to blame. It's the idea that we, as statisticians can do no wrong with data analysis, and keep teaching ourselves and reinforcing bad habits, that is to blame.
We have the technology to test our assumptions. We have the ability to use the proper regression model for the data. We don't need to make "Normal approximations" anymore. But we still do. We allow, "It looks good" to substitute for actually being good. We assume simplicity in our models. If the systems we were modeling were simple, then why don't we already know everything? It's because we are using simple models on complex systems, then fearing "over fitting" a model, we tend to severely underfit the model. Then the model doesn't do a good job describing the system and we wonder why so many scientists loathe statistics and statisticians.
Maybe we need to lead by example. Perhaps all statisticians should be required to get a minor in an area outside of mathematics. Perhaps, statisticians need to take classes from industrial engineering departments so they can understand where a lot of there data comes from. Perhaps statisticians need to take classes in chemistry and biology to understand that most scientists refer to "repeated measures" as a "replicate". Perhaps statisticians should work with the devices that generate the data they use to get an understanding about QC issues that crop up and tend to go unnoticed because a lot of scientists believe, "Consistency => Quality" and QC samples "Between the lines => Consistency".
We can all do a lot better. We need to do a lot better. And we, as statisticians, need to be the ones to change first!
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
Original Message:
Sent: 03-07-2016 11:09
From: Steve Pierson
Subject: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein
Dear All,
Below is the message being sent today to ASA members from ASA President Jessica Utts and Executive Director Ron Wasserstein regarding today's release of an ASA Statement on p-values.
The relevant links are:
Steve
|
ASA Statement Released Today
Dear Member,
Today, the American Statistical Association Board of Directors issued a statement on p-values and statistical significance. We intend the statement, developed over many months in consultation with a large panel of experts, to draw renewed and vigorous attention to changing research practices that have contributed to a reproducibility crisis in science.
"Widespread use of 'statistical significance' (generally interpreted as 'p < 0.05') as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process," says the ASA statement (in part). By putting the authority of the world's largest community of statisticians behind such a statement, we seek to begin a broad-based discussion of how to more effectively and appropriately use statistical methods as part of the scientific reasoning process.
In short, we envision a new era, in which the broad scientific community recognizes what statisticians have been advocating for many years. In this "post p < .05 era," the full power of statistical argumentation in all its nuance will be brought to bear to advance science, rather than making decisions simply by reducing complex models and methods to a single number and its relationship to an arbitrary threshold. This new era would be marked by radical change to how editorial decisions are made regarding what is publishable, removing the temptation to inappropriately hunt for statistical significance as a justification for publication. In such an era, every aspect of the investigative process would have its appropriate weight in the ultimate decision about the value of a research contribution.
Is such an era beyond reach? We think not, but we need your help in making sure this opportunity is not lost.
The statement is available freely online to all at The American Statistician Latest Articles website. You'll find an introduction that describes the reasons for developing the statement and the process by which it was developed. You'll also find a rich set of discussion papers commenting on various aspects of the statement and related matters.
This is the first time the ASA has spoken so publicly about a fundamental part of statistical theory and practice. We urge you to share this statement with appropriate colleagues and spread the word via social media. We also urge you to share your comments about the statement with the ASA Community via ASA Connect. Of course, you are more than welcome to email your comments directly to us at ron@amstat.org.
On behalf of the ASA Board of Directors, thank you!
Sincerely,
Jessica Utts President American Statistical Association
Ron Wasserstein Executive Director American Statistical Association
|
------------------------------
Steve Pierson
Director of Science Policy
American Statistical Association
------------------------------