ASA Connect

 View Only
Expand all | Collapse all

Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

  • 1.  Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-07-2016 11:10

    Dear All,

    Below is the message being sent today to ASA members from ASA President Jessica Utts and Executive Director Ron Wasserstein regarding today's release of an ASA Statement on p-values.

    The relevant links are:

    Steve

     

     ASA Statement Released Today

    Dear Member,

    Today, the American Statistical Association Board of Directors issued a statement on p-values and statistical significance. We intend the statement, developed over many months in consultation with a large panel of experts, to draw renewed and vigorous attention to changing research practices that have contributed to a reproducibility crisis in science.

    "Widespread use of 'statistical significance' (generally interpreted as 'p < 0.05') as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process," says the ASA statement (in part). By putting the authority of the world's largest community of statisticians behind such a statement, we seek to begin a broad-based discussion of how to more effectively and appropriately use statistical methods as part of the scientific reasoning process.

    In short, we envision a new era, in which the broad scientific community recognizes what statisticians have been advocating for many years. In this "post p < .05 era," the full power of statistical argumentation in all its nuance will be brought to bear to advance science, rather than making decisions simply by reducing complex models and methods to a single number and its relationship to an arbitrary threshold. This new era would be marked by radical change to how editorial decisions are made regarding what is publishable, removing the temptation to inappropriately hunt for statistical significance as a justification for publication. In such an era, every aspect of the investigative process would have its appropriate weight in the ultimate decision about the value of a research contribution.

    Is such an era beyond reach? We think not, but we need your help in making sure this opportunity is not lost.

    The statement is available freely online to all at The American Statistician Latest Articles website. You'll find an introduction that describes the reasons for developing the statement and the process by which it was developed. You'll also find a rich set of discussion papers commenting on various aspects of the statement and related matters.

    This is the first time the ASA has spoken so publicly about a fundamental part of statistical theory and practice. We urge you to share this statement with appropriate colleagues and spread the word via social media. We also urge you to share your comments about the statement with the ASA Community via ASA Connect. Of course, you are more than welcome to email your comments directly to us at ron@amstat.org.

    On behalf of the ASA Board of Directors, thank you!

    Sincerely,

    Jessica Utts
    President
    American Statistical Association


    Ron Wasserstein
    Executive Director
    American Statistical Association

    ------------------------------
    Steve Pierson
    Director of Science Policy
    American Statistical Association
    ------------------------------


  • 2.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-08-2016 06:24

    Not using 0.05 regardless of sample size should always have been obvious.  But people don't listen when those with a great deal of clout won't.   It's about time that that has changed.  Congratulations. 

    https://www.researchgate.net/publication/262971440_Practical_Interpretation_of_Hypothesis_Tests_-_letter_to_the_editor_-_TAS

    ------------------------------
    James Knaub
    Lead Mathematical Statistician
    Retired



  • 3.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-08-2016 08:38

    A p-value of <= 0.05 is not "God" (or should I say "Fisher"?) given.  As Fisher and many others indicated, the p-value criterion above should depend on the nature of the data, sample size, and an associated Type II error  rate for that p-value of significance.  Unfortunately in the regulatory and other industries, it seems to be the Bible or Bhaagavat Geeta.  This concept should be changed based on the factors I just described.  In the industry an SOP (Standard Operating Procedure) is written generally with a p-value of <=0.05 and changing that creates a lot of headaches and other issues! As a group, we in ASA maybe able to change it for better.

    Ajit K. Thakur, Ph.D.

    Retired Biostatistician and a Consultant

    ------------------------------
    Ajit Thakur
    Associate Director



  • 4.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-09-2016 07:29

    Healthcare data can be very huge or very small.  If someone presents a statistical analysis with a p-value, the following question of "what's next?" occurs.  Sometimes it is very difficult to explain what is next after the analysis is completed using the p-value criteria.  There are other ways to analyze data besides using a p-value such as confidence intervals, control charts (if wanting to see if a particular process is in statistical control over time),etc.

    When I was getting my doctorate in biostatistics, there was a great debate between my biostatistics department and the epidemiology department about using p-values.  Epidemiologists do not use p-values for their analysis.  They  use confidence intervals and did not believe the p-value had any relevance.  It became an interesting debate.

    I am glad the ASA is coming out with changing the stance on p-values.  Now, the next step would be to get this stance into the statistics classrooms.

    ------------------------------
    Patricia Schlorke, DrPH
    Biostatistician
    Texas Health Resources



  • 5.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-09-2016 11:00

    It's always been disturbing in the medical literature that p<0.05 is interpreted as "significant" instead of "statistically significant" and often with no discussion whatsoever of "clinically meaningful."  When consulting, I give both the p-value and some sort of estimate of effect size, usually a difference of means with a confidence interval or a ratio measure with CI. The 95% CI is simply an inversion of the test statistic in most cases, so it's just different ways to convey the same message. Trouble is that reviewers often focus on the p-values and medical researchers aren't always up to speed on interpreting p-values. For those you consult with on a regular basis, I think it is worthwhile to educate.

    They do have some understanding of sensitivity and specificity of a test. So, if you get them to see the same thing holds for statistical testing, a lot of them grab the concept. In other words, we want a sensitive test that will find differences if they are there. We also would like for it to be specific so we don't find things when they aren't there. Sensitivity has False Negatives or Type II errors...we fail to find differences when they are there. Specificity has False Positives or Type I errors...we find differences that aren't really there. They get that part.

    Alpha the probability of a Type I or False Positive and can only occur if we find differences. Beta is a Type II error or False Negative and can only occur if we fail to find differences. Since our hypothesis testing situation is always geared towards reducing errors but also at finding differences, we are usually very interested in eliminating False Negatives or increasing the sensitivity of the test. Statisticians call sensitivity the "power" of the test and we can get more power with larger sample sizes.

    So, alpha is something we control by deciding up front how large our False Positive risk is. One way to control that is through the "pre-test probabilities" just like prevalence of disease. So, fishing for differences is like a screening test...too many False Positives. Very directed hypothesis tests, however, are like diagnostic tests and you have a lower expectation of finding something that isn't there.

    I also think this is missed by users of Statistics...in the age of Big Data there is a lot of opportunity for fishing but understanding that p<0.05 means something quite different in these instances is vital to understanding. It's really a matter of understanding decision making, so if you think about what the GWAS analysts did, they tried to eliminate False Positives by making lowering the Type I error probability.

    The big jump that few understand is that p-value is not alpha...it's more like the results of a single diagnostic test where the probability you have made the right decision is tied to the sensitivity, specificity (LR+) and the pre-test probability...something we can control by doing what the Declaration of Helsinki calls for, a thorough understanding of the problem prior to doing research by doing a literature review and/or a pilot study.

    ------------------------------
    Warren May
    Professor



  • 6.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-09-2016 13:59

    1) Sir R. A. Fisher was against the idea of fixed alpha level of "significance."  He said so in his writings.

    2) The U.S. Supreme court consider the matter as the crux of a case and ruled against a fixed alpha:

    No. 09-1156, decided March 22, 2011.

    They used the term bright line for fixed alpha and said that was not an acceptable basis for a decision that thus ignored all else about the issue,

    ------------------------------
    Kenneth Burnham
    Colorado State University



  • 7.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-09-2016 14:08

    3) As best I can tell, Fisher would not want to use a P-value if there was an alternative model to the null. That situation was then not one of testing a null, but model comparison, strength of evidence, effect size estimation, etc. I think we should concur, null H testing is mostly used where it is not even the appropriate basis for inference.

    ------------------------------
    Kenneth Burnham
    Colorado State University



  • 8.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-16-2016 03:24

    This is a very nice example which I am going to use in classroom.
    Kenneth - thank you very much for this reference.

    ------------------------------
    Christian Graf
    Dipl.-Math.
    Qualitaetssicherung & Statistik

    "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."

    Ronald Fisher in 'Presidential Address by Professor R. A. Fisher, Sc.D., F.R.S. Sankhyā: The Indian Journal of Statistics (1933-1960), Vol. 4, No. 1 (1938), pp. 14-17'



  • 9.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-09-2016 14:29

    C. R. Rao wrote re Fisher in the journal Statistical Science, 1992, pages 34-48. See Section 7.1 (page 46); if "we" followed Fishers advice we would not have this P-value mess of today. 

    ------------------------------
    Kenneth Burnham
    Colorado State University



  • 10.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-10-2016 13:34

    As many of us in the marketing and credit areas (where sample sizes are, as the Donald says, YUGE) learned years ago, p-values can be meaningless. With large samples, many thanks relationships are significant at very small p-values.

    What is important is the significance at the business level AND does the relationship make sense.

    A very small difference may be significant statistically, but it may be so small that no one cares. This is why it is important to have Subject Matter Expertise and use common sense when interpreting results.

    from my ZMAX ZTE






  • 11.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-11-2016 01:20

    Rather than P-values, use odds, i.e., the likelihood ratio, which is the correct measure of evidence about a comparison; a P-value is not the correct measure of evidence. There is plenty on this in the statistical literature (e.g., Royall, R. M. 1997. Statistical evidence: a likelihood paradigm. Chapman and Hall, London, UK). In this framework one can, I think, see why P=0.05 is a weak basis for inference (semantics aside about “weak” or “strong”). P=0.05 being close to 0 people may perceive it as not-weak “evidence.” But P is a very nonlinear transformation of the likelihood ratio, which is the correct measure of evidence. I will use here that LR as odds against the “null” (say as mean=0). For summary statistic z as normal(mean, 1), odds=exp(z*z/2). All this generalizes, of course and could be used in a more nuanced manor. A few results. Interpret really as odds:1, e.g., 6.83:1 against the null:

    z               P         odds

    1.96       0.05        6.83

    2.17       0.03       10.53

    2.58       0.01       27.59

    3.29       0.001   224.48

    Whereas the odds (evidence) ranges from 1:1 to infinity:1 (i.e.,are unbounded) the range of P is 1 to 0. People seem to look at, say, 0.05 and 0.01 and think that is not a big difference, not a lot of difference in strength of evidence. But not so, evidence in these two cases is 6.83 vs. 27.59. Moreover, when one considers the unbounded nature of odds, 6.83 looks like weak evidence – which I think it is. However, we can dispense with subjective qualitative statements about strength of evidence and simple state the actual evidence. The issue of what the scientist wants to decide to do is up to them. A fundamental role of statistics is to extract and state the information/evidence in the data, not to tell the user what to do; i.e., not to set a bright-line to demarcate a reject/no-reject decision.

    One way to view this is to use proper evidence, i.e. odds (= LR). Then if normality ~applies (or applies exactly - or just that people compute P assuming normality anyway) odds ratio against null is about 7:1. But it can range 1:1 against to almost infinite against. Good data can produce 1000:1 against (then P closer to 0, but still then "close" to .05). When you realize the odds can be 10, 100, 1000, etc. to 1 against, then 7:1 seems weak. The issue relates to the VERY nonlinear relationship of P to odds. One sees .05 and stronger against, lies in 0.05 to 0. In that observation (compared to the range 1 to 0) .05 seems like it ought to be stronger than just "weak." But its not. All relates to P NOT being a proper measure of evidence against the null.  

    And done properly the odds ratio is 1-sided so the silliness of a 2-sided test goes away (for a simple  case about a single estimated effect).

    Royal's book may get at some of this - I do not remember.

    Thus while P is mostly uninformative, it is the focus on null H testing when there is an alternative H that is the fundamental mistake here. 

    Ken

    ------------------------------
    Kenneth Burnham
    Statistician, ASA Fellow
    Colorado State University



  • 12.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 04-03-2016 23:03
      |   view attached

    I attach a corrected "comment." The original version has, incorrectly, a negative sign in places. A typo, that my mind read as a + because it knows + is correct.

    ------------------------------
    Kenneth Burnham
    Colorado State University

    Attachment(s)

    pdf
    to ASA.pdf   86 KB 1 version


  • 13.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-13-2016 05:23

    How about we make P-values reflect the probability of someone getting different results?

    Suppose that we have a t-test where T-value = Difference/Std Err. If we have a difference of 6.00 and a Std Err of 2.00, we get a T-value of 3.00. If the T-critical value is 2.00, we claim; "We reject the null hypothesis." We can create a confidence interval of 6.00 +/- T-crit*Std Err => 6.00 +/- 2.00*2.00 => (2.00, 10.00). P-Value of test < 0.05.

    If someone else does a replicate experiment, if they find that the Difference < T-crit*Std Err, they will fail to reject the null hypothesis. Based upon the data we have, any time someone finds a difference less than 4.00, they get different results. 4.00 to 2.00 are all within our original confidence interval. There is about a 15% chance someone will fail to replicate our results! How does a 15% chance of failing to replicate the results lead to a P-value of less than 0.05?

    Furthermore, if we look at a lot of the assumptions we use in our tests, there is a lot of evidence telling us our assumptions are WRONG! But we don't listen. Consider for a moment our assumption that we can use a sample that is "representative" of the true population. How often is this assumption true? As best I can tell, not often enough! Think about all the psychology experiments that are not repeatable. Think of all the clinical trials that "look promising" in a Phase 2 trial and fail miserably in Phase 3? If the sample used in Phase 2 was representative of the population, then Phase 3 should bolster our claims, not diminish them.

    If we look at  something like bootstrapping, we assume the data we already have is representative of the population. Then we resample from this small sample. If the sample is not representative, then bootstrapping is a pointless exercise. A better idea would be to use the data we have and generate 100's to 1,000's of random values and use the mean and standard deviation of our sample data. That way, we keep the same variability our original sample had.

    For those of us that use regression models, how about we start using R^2(predicted) and Confusion Matrices as part of our diagnostics? I've seen several "textbook" data sets where the model "looks good" but fails miserably in predicting the event outcome. The first data set I noticed this on, was David Kleinbaum's "Evan's County Data". About 10% of the people in the study data had CHD. Using the model David created accurately predicts 6-12 of the 60+ cases. A confusion matrix let's you know right away that the model isn't very good for predicting who has CHD. But, since that isn't a criteria for the model, and software for logistic regressions don't offer the option, it's up to the statistician to go forth and check things out. What is sad, is that David did everything right, based on what we learn in stats classes. He's not to blame. It's the idea that we, as statisticians can do no wrong with data analysis, and keep teaching ourselves and reinforcing bad habits, that is to blame.

    We have the technology to test our assumptions. We have the ability to use the proper regression model for the data. We don't need to make "Normal approximations" anymore. But we still do. We allow, "It looks good" to substitute for actually being good. We assume simplicity in our models. If the systems we were modeling were simple, then why don't we already know everything? It's because we are using simple models on complex systems, then fearing "over fitting" a model, we tend to severely underfit the model. Then the model doesn't do a good job describing the system and we wonder why so many scientists loathe statistics and statisticians.       

    Maybe we need to lead by example. Perhaps all statisticians should be required to get a minor in an area outside of mathematics. Perhaps, statisticians need to take classes from industrial engineering departments so they can understand where a lot of there data comes from. Perhaps statisticians need to take classes in chemistry and biology to understand that most scientists refer to "repeated measures" as a "replicate". Perhaps statisticians should work with the devices that generate the data they use to get an understanding about QC issues that crop up and tend to go unnoticed because a lot of scientists believe, "Consistency => Quality" and QC samples "Between the lines => Consistency".

    We can all do a lot better. We need to do a lot better. And we, as statisticians, need to be the ones to change first!    

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)



  • 14.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-15-2016 14:39

    I'm posting this response on behalf of a non-member.  I will send your responses back to him.

    Thanks.

    Ron

    _____________________________________________________

    I was very interested to see the statement on p-values recently released by the ASA.  I was surprised, however, that it left out some of the most common misuses I see, namely, their calculation with population data, convenience samples, and other non-random samples.  There appears to be widespread lack of understanding that p-values are the probability of making one type of inferential mistake due to sampling instead of enumerating, and that their validity is based on assumptions that have implications for how samples must be drawn.  I understand these sorts of errors were intended to be covered by principle 1 in the statement, but that portion of the statement makes no mention of sampling or sampling error.  How can proper and improper use of p-values be explained without the context of sampling error? 

    Are others seeing p-values with population data and convenience samples?”

    Regards,

    Ken

    Kennard T. Wing

    Senior Process Improvement Advisor

     St. Christopher's Hospital for Children 160 East Erie Avenue Philadelphia, PA 19134 

    ------------------------------
    Ron Wasserstein
    Executive Director
    The American Statistical Association
    Promoting the Practice and Profession of Statistics
    732 N. Washington St.
    Alexandria, VA 22314
    703-684-1221 x1860



  • 15.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-16-2016 10:06

    Hello Kennard,

    Yes! And on more than one occasion; I hesitate to say how often I have seen it.

    I will illustrate with one story. A doctor came into my office to inquire about the statistics of his research. He wanted a p value. I informed him that he did not need one, because he had census data. He was very upset about this and insisted that I come up with a p value for him or he would not be able to get the report published.

    So, because it appeared to be a battle that I could not win, I wrote .05 on a piece of paper, handed it to him, and sent him on his merry way.

    Very sad for me.

    ------------------------------
    Gretchen Donahue



  • 16.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-18-2016 20:51

    Gretchen,

    I can do one worse, one that actually got me fired!

    I was working with a professor on Survey data. He wrote the survey (poorly) and wanted me to analyze the data. The results were not to his liking. It showed that there were differences between groups he "knew" were not different and did not show differences between groups that should be different. I reanalyzed the data 3-4 times using other methods I felt were appropriate for the data. He still didn't get the results he wanted. Obviously, it was my fault his poorly written survey didn't show the results he wanted. He ended up firing me and trying to hire someone else to give him the results he wanted. He ended up hiring an undergrad with 2-3 stats classes under their belt. When I see his paper come out in print, I will write the journal editor and have it removed, if the journal editors actually care about honesty and the correctness of the analysis. (Most don't.)   

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)



  • 17.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-21-2016 07:50

    This is related to the distinction between model-based and design-based analysis. In design-based analysis, inferential information does not make sense when the whole population is available. Not so in model-based analysis. Lumley (2011), section 1.1.1, has a good discussion, The term 'population model' (sometimes loosely called 'superpopulation') is important here, see e. g. ¨Särndal et. al. (1992).

    References:
    Särndal, CE, Swensson, B, Wretman, J (1992). Model Assisted Survey Sampling. Springer.
    Lumley, T (2011). Complex surveys. Wiley.

    ------------------------------
    Tore Wentzel-Larsen
    Researcher
    Norwegian Centre for Violence and Traumatic Stress Studies
    Regional Center for Child and Adolescent Mental Health, Eastern and Southern Norway



  • 18.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-29-2016 15:04

    Thanks for a stimulating discussion, particularly the reference to Ken Brewer's 2013 Waksberg Paper.

     

    As regards "testing" using a complete count (i.e. a census) this cannot always be considered "misuse"  according to Deming's (Some Theory of Sampling, Dover publications, Chapter 7, p 252): Distinction Between Enumerative and Analytic Studies

     

     

    "As already noted, a complete count, for enumerative purposes, possesses no error of sampling. On the other hand, for analytic uses the complete count still has a sampling error with coefficient of variation equal to (q/Np)."

     

    To me this implies that for analytical purposes testing is allowed. Or not?

     

     

    CHEERS,

     

    IWAN






  • 19.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-30-2016 10:55
    Iwan and Others:
     
    My guess is the CV is really sqrt (q / np) as I believe back then N was used for sample size and NsubP was used for population size.
    Anyway, good points in this thread about a distinction between design-based and model-based analysis...
     
     
    _________________________
     
    David Bernklau
    (David Bee on Internet)
    _________________________
     
     





  • 20.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-30-2016 19:50

    A difficulty with this view is that Deming was skeptical of the validity of hypothesis testing for analytic purposes. By "analytic purposes", Deming meant prediction of future population counts. In the near term, a different set of census takers will obtain a different count, and different sets of people will become momentarily unavailable or unobservable.  In the long term demographic trends will apply.

    Conducting a hypothesis test based on a single enumeration takes into account neither source of variation, neither variation among census takers, short-term observability, nor demographic trends. Deming argued that analytic predictions need to take into account the processes by which variation arises, and trends over time.

    Demig's view has been a minority one. But he would not have advocated hypothesis tests based on a single 100% enumeration as a basis for predicting either the performance of a different census crew, or future population trends.

    ------------------------------
    Jonathan Siegel
    Associate Director Clinical Statistics



  • 21.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-16-2016 05:55

    Hello everyone,

    This is great information for the lectures I am working on.  Big thanks.

    Mike Jadoo

    ------------------------------
    Michael Jadoo
    Economist
    Bureau of Labor Statistics



  • 22.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-17-2016 09:46

    I would like to see ASA actively reach out with this message to the journal editors. For a start, I would like to see this take place in biomedical journals (e.g., JAMA, NEJM), where this p < .05 is firmly entrenched. 

    Now, if we tell the journal editors to ban p-values, they will ask for better alternatives.  Are we prepared to give them that?  Are confidence intervals the remedy?  Are Bayes factors the answer?  There are more troublesome foundational issues with statistical inference than just p-values. Some of these are cogently articulated in Richard Royall's "Statistical Evidence: A Likelihood Paradigm." 

    ------------------------------
    Ravi Varadhan
    Johns Hopkins University



  • 23.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-18-2016 03:59

    In Ken Brewer's Waksberg paper for 2013, he had something to say about this in section. 3.1. 

    "Three controversies in the history of survey sampling," Ken Brewer, 
    http://www.statcan.gc.ca/pub/12-001-x/2013002/article/11883-eng.htm

    ------------------------------
    James Knaub
    Lead Mathematical Statistician
    Retired



  • 24.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-18-2016 09:54

    As I see it the fundamental problem with mistakes in statistical reasoning is that there exist three theories of inferential statistics: Fisher, Neyman-Pearson, and Bayes (see Brad Efron's freely available paper "R. A. Fisher in the 21st Century" Statistical Science, 1998, Vol. 13, No. 2, pp. 95-122).  Introductory textbooks seem to proffer their own versions of a "grand unifying theory", which I believe has led to obfuscation and deception.   It is time for the leadership of statistical societies, such as the ASA and RSS,  to give us a  "grand unifying theory" especially in this bewildering era of "big data science."  According to the Joan Fisher Box's biography, Fisher systematized the prevailing practice of considering deviations that are two standard errors from the center as significant (they used a different standard errors before Fisher, however, the concept was the same).  What is the prevailing practice that we should now embrace?  Combining evidence with p-values and P(H|D) are simply the Bayesian leg of the three legged stool of statistics.  Give us a bean bag to sit on, or at least another leg and make it a chair. 

    ------------------------------
    Eugene Komaroff
    Professor of Education
    Keisser University



  • 25.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-22-2016 13:17

    In many cases, nonBayesian results such as p-values based on p( D | H ) are approximately or exactly equivalent to Bayesian results, based on p( H | D ), for some range of prior distributions. (E.g. see Ventura L, Cabras S, Racugno W. Default prior distributions from quasi-and quasi-profile likelihoods. Journal of Statistical Planning and Inference. 2010 Nov 30;140(11):2937-42. http://paduaresearch.cab.unipd.it/8789/1/2011_11_20110928110001.pdf) It seems to me the reasonable way to "unify" the different paths is to choose and report one or more among such priors, whether proper or improper. I predict such reporting would be almost universally ignored.

    Attribution of Bayesian meaning to nonBayesian results seemingly is routine, and I think we should do our part to report assumptions relevant to such attribution. But I doubt science would be much improved. And so, I think "unification" would be insufficient even if appropriate.

    ------------------------------
    Tom
    Thomas M. Davis



  • 26.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-18-2016 16:10

    I heartily agree  with this....

    "I would like to see ASA actively reach out with this message to the journal editors. For a start, I would like to see this take place in biomedical journals (e.g., JAMA, NEJM), where this p < .05 is firmly entrenched."

    ...but I also I think this....

    "Now, if we tell the journal editors to ban p-values, they will ask for better alternatives.  Are we prepared to give them that?  Are confidence intervals the remedy?  Are Bayes factors the answer?  There are more troublesome foundational issues with statistical inference than just p-values."

    ...is part of the reason the ASA statement (which I love) was so intentionally vague - because we, as a field, still have not agreed on that (and probably never will).  And I'm not sure there is a solution.  Suppose we settled on the idea that we would unilaterally replace p-values with all Bayesian-style analyses.  I fear that non-statistically-trained scientists will merely settle on a new version of p

    For that reason, I'm generally camp saying in the "p-values are okay as part of a comprehensive presentation of results that includes effect sizes, confidence intervals, AND p-values and most importantly, doesn't merely boil results down to a Yes/No decision about significance."

    ------------------------------
    Andrew D. Althouse, PhD
    Supervisor of Statistical Projects
    UPMC Heart & Vascular Institute
    Presbyterian Hospital, Office C701
    Phone: 412-802-6811
    Email: althousead@upmc.edu



  • 27.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-22-2016 12:18

    I have been holding off on commenting to see what others say.  Everyone is right in their own way.  We all have p-value horror stories. The statement from the ASA is vague but useful.  But the statement and the discussion focus mostly on publication statistics.  People who work in industry don't have publication as their primary goal.  And just because computers are faster now, it doesn't make building complicated models faster when these models require sophisticated specifications, programming, and resource buy-in within and outside the corporation.   Industrial and business statisticians, whether in pharma, advanced manufacturing, finance or consumer goods do a lot of similar, but distinct, things over and over.    (such as machine calibrations and clinical lab tests) .

    It is easy for a full professor to advocate a multi-level Bayesian model with arcane priors for a large social science study outside of stockholders, stakeholders, shifting markets, government and NGO regulations, etc.  So Neyman-Pearson hypothesis testing will stay in our toolbox.  We have had much success in quality, often with cutoffs much lower than 0.05.   The first principal in the article "P-values can indicate how incompatible the data are with a specified statistical model."  is incredibly useful in pushing aside irrelevant concerns of stakeholders and making the end model more robust.   Quite often the data is 'compatible' with the specified statistical model and we can move on.   Also for low variance situations, the use of non-inferiority and equivalence approaches has greatly improved upon decision-making.  

    ------------------------------
    Georgette Asherman
    Principal
    Direct Effects, LLC



  • 28.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-27-2016 09:05

    Dear ASA colleagues,

    I'm posting the message below on behalf of a non-member.  I will pass along back to him your comments posted on this site.

    Thank you.

    Ron

    Subject: Beyond just P-values

    Ron:

    I am a scientist working in the life sciences where P-values have been debated for several decades.  I applaud ASA's position on P-values and the challenge to journal editors to help transition away from these traditional methods.

    I am writing to suggest that confidence intervals of various kinds are tied to many of the same limitations as are P-values (e.g., "the CI overlapped zero, therefore the effect is not significant."  I do not view CIs as a fundamental part of the future.  Some fields avoid Bayesian approaches even when near uninformative priors are used, partially due to the complexity involved.  What else is there?  

    I submit there is a fairly new paradigm -- "information-theoretic."  These methods stem from the famous Japanese statistician, Hirotugu Akaike  who derived inferential approaches based on Kullback-Leibler information theory and entropy.  These methods can be thought of as "extended likelihood theory."  While rooted in deep theory, these approaches are simple to compute and understand. They provide quantities such as

          L(g|data),     Prob{g|data},     odds ratios of models i and j,  

                  where, L is likelihood and g is the model.

    Note, the model probabilities are not conditioned on the null, but on the data -- a substantial advantage.  Knowing the model probabilities allows several ways to make formal inferences based on all the models under consideration (multimodel inference).  A variance component due to model selection uncertainty is easy to incorporate into estimates of precision.  This class of methods is free from any notion of priors on models or priors on parameters. 

    Why not more consideration of this new paradigm as the P-value culture is replaced?

    Thank you,

    David R. Anderson

    Colorado State University

    ------------------------------
    Ron Wasserstein
    Executive Director
    The American Statistical Association
    Promoting the Practice and Profession of Statistics
    732 N. Washington St.
    Alexandria, VA 22314
    703-684-1221 x1860



  • 29.  RE: Message to ASA Members on ASA P-Value statement from Jessica Utts and Ron Waserstein

    Posted 03-30-2016 12:33

    Hi,

    I'm glad that this discussion has been started, but I don't think we are addressing the correct issues.  IMHO, the idea of using the p-value to assess the correctness (acceptability / adequacy) of a hypotheses is good.  The way it is being used and taught is bad.  

    When you look at the overlap of two distributions and see that they overlap by more than 95%, and say that the two distributions are not significantly different, it may or may not be a valid statement, depending on the practical significance of the tails that don't overlap.  But when we use the T-test to compare, e.g., a Pareto distribution with a Weibull - that's a problem.  

    When we do a Box-Cox, or log, or logit transformations on long-tailed distributions and look at the overlap of the transformed distributions and conclude that they overlap by more than 95% - this is wrong.

    When we hide behind Central Limit Theorem to make the distribution we are dealing with look normal - this is wrong.

    The Anscombe's Quartet is another example of an incorrectly applied p-value.

    But the p-value is not a wrong concept.  We just have been misusing it over the last 100 years.

    ------------------------------
    Best Regards,
    Alex Gilgur