Discussion: View Thread

Choosing the number of tails for a test in planning a study

  • 1.  Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 15:59
    Hello all,
    I am working with some others on a project and we are looking for real life examples where the choice of the number of tails to use in designing a study is a question.  Specifically, we were wondering if anyone has examples where the statistician or scientist would have preferred a one-tailed hypothesis, but used a two-tailed hypothesis in planning because there was a chance of a result in the "non-preferred" direction.  Likewise, we would be interested in examples where a one-tailed hypothesis was used in planning, even if the preference would have been for two-tailed.  I appreciate any real examples people can contribute.


    -------------------------------------------
    Robert Podolsky
    Associate Professor
    Wayne State University
    -------------------------------------------


  • 2.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:07

    In our practice, we always use 2-sided methods, whether or not our investigators have one-sided intetrest. Even drug company protocols, with planned submission to the FDA are universally two-sided, despite one sided interest.  It is true power is sacrificed by this, but uniformity is achieved. 
    -------------------------------------------
    Jon Shuster
    University of Florida
    -------------------------------------------








  • 3.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:15


    Hi Jon,
    In my former life as a biostatistician at the University of Pennsylvania Medical Center, the statistical tests were developed based on what we needed to show. If we had to show that a certain treatment was better than placebo or than the current treatment, we might use a one-sided test, particularly if we had a smaller sampling population, say for a rare disease or an expensive treatment, and didn't want to waste power on detecting a result that was not of interest.
    Christine

    -------------------------------------------
    Christine Hardy
    Communications Media, Inc.
    -------------------------------------------








  • 4.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:29
    Uh, shouldn't the nature of the test (one-sided or two-sided) be determined by the objective of the experiment?  If the researcher is interested in changes in the response in one direction only, then a one-sided test is called for (e.g., increasing abrasion resistance of a thermoplastic).  OTOH, if any change is of interest, then two-sided test is called for (e.g., comparing a new method of chemical analysis with a standard method).

    -------------------------------------------
    Wayne Fischer
    Statistician
    University of Texas Medical Branch
    -------------------------------------------








  • 5.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:48
    Thank you to everyone who has responded with all of the comments. My approach to deciding whether to use one- vs two-sided hypotheses is based on asking one important question, even if the investigator is confident they are only interested in a result in one direction: Would you want to know if a difference that went in the opposite direction is significant?  I have rarely had an investigator answer "no" to that question.  As such, I end up using a two-sided hypothesis, even when the investigator would prefer one-sided.  I believe it is this sort of rationale that leads to the practice that Jon mentioned.  Does the FDA require two-sided hypotheses for uniformity or for some other rationale, perhaps similar to my reasoning above? 

    -------------------------------------------
    Robert Podolsky
    Associate Professor
    Wayne State University
    -------------------------------------------








  • 6.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:55
    Robert,
    Page 25 of the ICH E9 guidelines gives some indication of FDA's thought process.
    "
    The issue of one-sided or two-sided approaches to inference is controversial and
    a diversity of views can be found in the statistical literature. The approach of setting type I errors for one-sided tests at half
    the conventional type I error used in two-sided tests is preferable in regulatory settings. This promotes consistency with the two-sided confidence intervals that are generally appropriate for estimating the possible size of the difference between two treatments.
    "

    I like the emphasis of estimation in the pragmatic guidelines they give since most people can easily understand the idea of an estimate  +/- some error coefficient.


    -------------------------------------------
    Rickey Carter
    Associate Professor of Biostatistics
    Mayo Clinic
    -------------------------------------------








  • 7.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:55

    There is one possible reason for sticking with 2-sided hypotheses. In some cases, a result will be significant if considered one-sided and not significant in the two-sided case, and thus the notion of "structuring the test post-hoc" arises. If all tests are uniform, this is not an issue.
    -------------------------------------------
    Paul Thompson
    Director, Methodology and Data Analysis Center
    Sanford Research/USD
    -------------------------------------------








  • 8.  RE:Choosing the number of tails for a test in planning a study

    Posted 11-05-2013 15:02
    The decision space is really divided into 3 areas: < = >. When deciding whether to use a one- or two-sided tail, it is helpful to ask what actions will be taken for the three areas. If only two actions are to be taken, that is, one action to be taken is the same for one of the tails and the "=", then a one-sided test is in order. If there are three actions (decisions/conclusions), then a two-sided test is in order, with a follow-on post-hoc look at which of the two tails is indicating which decision should be taken.  While many situations warrant two-tailed tests, this is not always the case.

    -------------------------------------------
    Dr. N. Shirlene Pearson
    Statistical Consultant & Research Support Specialist
    Southern Methodist University
    Dallas, Texas USA
    -------------------------------------------








  • 9.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 12:02
    To respond to the original question, Yes, I can recall at least one intramural protocol reviewer who requested that we change our test to a two-sided test, even though the research question clearly called for a one-sided hypothesis.  And Yes, the reviewer's reasoning was that we had to allow for the possibility that the agent could unexpectedly cause significant harm instead of the expected significant benefit.  

    Ever since that incident, I've had a question of my own, which I will ask here: Would it be appropriate in this circumstance to use a two-tailed test with unequal tails? In other words, put 80% of your alpha into the tail that's in the expected direction, and only 20% of your alpha into the tail that's in the unexpected direction?  So that, if total alpha is 0.05, then the significance level is 0.04 for differences in the expected direction, but 0.01 for differences in the unexpected direction?  (disclaimer: 80% and 20% were chosen merely to simplify the writing.)

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences of Biostatistics
    -------------------------------------------








  • 10.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 12:17
    Mathematically,  it is fine to choose unequal tail areas for hypothesis tests.  But one reason that the FDA insists on two-sided tests (besides the uniformity argument) is that in addition to hypothesis tests,  one always wants to provide estimates and confidence intervals for the treatment effect and ideally, the CI should align exactly with the hypothesis test.  It is difficult to explain one-sided confidence intervals to non-statisticians since they are usually interpreted in the Bayesian way - i.e. the true value falls within the interval with xx probability. 

    -------------------------------------------
    Roy Tamura
    Associate Professor
    University of South Florida
    -------------------------------------------








  • 11.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 12:26
    Uneven tails...interesting...I don't see why not if it can be effectively justified. Of course you'd want to consider your differential power given opposite direction effects...perhaps of differing effect sizes.

    Not sure...comes back to whatever problem was being solved by asymmetric tails, I suppose. Not sure what phenomena would match that kind of expectation...excluding the middle, power to detect both superiority or inferiority.

    Probably in an equivalency trial would be reasonable to have a null with unequal 95% 1/2 widths if justified.  Perhaps bioequivalence of a generic antibiotic to original...more important to get enough than worry about too much.

    That would actually be nice in terms of advancing the approach...rather than simply 80%-120% of reference...but considerate of the application. I remember a discussion a collaborator had with the FDA ages ago touching on the distinction between antibiotics and endocrine drugs with regards to applying the same standards for generics.  My info may be dated, though.



    -------------------------------------------
    Jason T. Machan
    Director, Lifespan Biostatistics Core, Lifespan Hospital System
    Research Scientist, Biostatistics, Research Rhode Island Hospital
    Assistant Professor, Departments of Orthopaedics and Surgery
      The Warren Alpert Medical School, Brown University
    Director Biostatistics Externship, Adjunct Assistant Professor,
      Department of Psychology, University of Rhode Island

    -------------------------------------------








  • 12.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 14:53
    Unequal tails -- that is quite interesting.  In my opinion it would certainly be something that you could intellectually justify, but I suspect that the reviewer that you mentioned in your introductory paragraph would respond poorly simply because it wasn't conventional. There are some folks who are rather rigid, and this might be one of them.

    -------------------------------------------
    David Mangen
    Owner
    Mangen Research Associates, Inc.
    -------------------------------------------








  • 13.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 18:05
    I never thought about or saw anyone suggest unequal tails before. It seems to me like stacking the deck: I'll use a larger alpha for the finding that I want (to increase the chances of saying my result was significant) and a smaller one for the finding I don't want (to reduce the embarrassment of a significant result in the nonpredicted direction). The size of alpha is somewhat arbitrary anyway, but it seems to me intuitively that the tails should be equal to be fair, shouldn't they? Not to be rigid, but to avoid bias in favor of what we prefer. Or just do the one-tailed test.

    -------------------------------------------
    Annette Gourgey
    CUNY
    -------------------------------------------








  • 14.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 18:16
    When teaching here at Winona State, I usually mention the fact that 0.025 on each side is not necessary set in stone.  If the consequence of a Type I error is indeed asymmetric, then should we not have this reflected in how much error we are willing to put on each side of the distribution.  I have thought for some time now that one-sided tests and intervals have a place in our teaching.  In my opinion, this forces some focus on the potential consequences of the tests we are performing and thus students/practicioners need to think beyond, "p-value less than 0.05".

    Just adding my $0.02 worth,
    -------------------------------------------
    Christopher Malone
    Winona State University
    -------------------------------------------








  • 15.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 18:49
    Not all confidence intervals are symmetric. A client once hired me to provide a confidence interval for  single binomial proportion of (or near) 0.0% or 100.0%.  there are several binomial CI estimates available.



    -------------------------------------------
    Chris Barker, Ph.D.
    Consultant and
    Adjunct Associate Professor of Biostatistics
    www,barkerstats.com

    ---
    "In composition you have all the time you want to decide what to say in 15 seconds, in improvisation you have 15 seconds."
    -Steve Lacy
    -------------------------------------------








  • 16.  RE:Choosing the number of tails for a test in planning a study

    Posted 11-03-2013 13:16
    Do you ever discuss the Bayesian + decision theory solution -- produce a posterior distribution and minimize a loss function / maximize a utility function? That forces one to think about what you're going to do with the information and put the potential consequences front and center.

    As an aside, I think it's a shame that Bayesian methods are generally relegated to graduate-level courses; I think they're a lot easier to understand than frequentist approaches.

    -------------------------------------------
    Kevin Van Horn
    -------------------------------------------





  • 17.  RE:Choosing the number of tails for a test in planning a study

    Posted 11-04-2013 08:08
    I had been taught to use two tailed tests at the outset.   But when I started quality improvement research in health services I decided that one-tail was good because we were only looking for an improvement.  However the more I did these studies, the more I found evidence that some interventions were deleterious as well.  The intervention caused a problem that had been unforeseen.  This was as important to know as the improvement.  So, I'm back to 2 tailed tests.
    I think there is a place for them in certain situations.  The theory supports them.    



    -------------------------------------------
    Dorothy Syblik
    -------------------------------------------








  • 18.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 20:25
    Interim analyses are often performed with asymmetric boundaries, with a different "alpha" used for the futility boundary.

    -------------------------------------------
    David Bristol
    Statistical Consulting Services
    -------------------------------------------








  • 19.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 21:26
    There is a literature on what has been termed split-tailed tests (e.g., http://onlinelibrary.wiley.com/doi/10.1002/0470013192.bsa749/abstract) which may help to shed some light on this issue for some of us.

    -------------------------------------------
    Kevin O'Grady
    Affiliate Faculty
    University of Maryland
    -------------------------------------------








  • 20.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-29-2013 23:23
    Martin Abelson also talks about this in his book, Statistics As a Principled Argument, using colorful terms like the "tail and a half test" and the "lopsided test". ------------------------------------------- Stephen Simon Independent Statistical Consultant P. Mean Consulting -------------------------------------------


  • 21.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-30-2013 09:23
    Deciding on "one-tailed vs. two-tailed" is something you do at the planning stage. I discussed the issue in:

    O'Brien RG, Castelloe JM (2007), "Sample-Size Analysis for Traditional Hypothesis Testing: Concepts and Issues," in Dmitrienko A, Chuang-Stein C, D'Agostino R (Ed.), Pharmaceutical Statistics Using SAS: A Practical Guide, Cary, NC: SAS Press, 237-271. Get it here: http://hal.case.edu/~robrien/O'BrienCastelloe07.pdf

    As for using, say, 0.01 and 0.04 as the two alphas, this conforms to the recommendations in this cool paper by two outstanding statistical thinkers:

    Rosenthal, R. and Rubin, D. B. (1983). Ensemble-adjusted p values. Psychological Bulletin, 94(3):540-541. Get it here: http://hal.case.edu/~robrien/Rosenthal83Ensemble-adjusted%20p%20values..pdf 


    The bottom line for me is ...

    The standard two-sided test is nothing but conducting two one-sided tests and correcting using alpha/2, a la Bonferroni. To say we should always do this is dogma I have never understood. Now I fight it. 

    We need to form analyses around the given research questions. Often the question is, Is A123 BETTER/GREATER than B456? If so, then both "A123 is 'ESSENTIALLY EQUIVALENT' to B456" and "A123 is WORSE/LESS than B456" make up a single set of "A123 is NOT BETTER / NOT GREATER than B456." This is a single question, hence no need to adjust for multiple testing. In my view, the frequentist should just find the lower limit of the upper 1-alpha (one-sided) CI, and if you really want to, the one-tailed p-value.


    Two stories. (Hey, I'm 64; I've got too many stories.)

    23 years ago, I helped design a trial to test a drug for a very rare genetic disease that had no known effective treatment. We needed to better balance Type I and Type II error rates, so I proposed doing a one-tailed test, maybe even at alpha = 0.10; I can't remember exactly. Herecy! Would the FDA balk? I took the pro-active route by going to DC and presenting my rationale to FDA biostatisticians face-to-face. The first hour I met with just a single person and presented my case. He then called three others and we went through it all again. After these 2.5 hours, my strategy was given thumbs up and we conducted the trial accordingly. My point? The trial had special needs so the statistical planning required custom care. BUT MANY STUDIES HAVE SPECIAL NEEDS AND THUS REQUIRE SPECIAL CARE. If the research question is one-sided, so be it. I teach this in this "certificate" course in clinical trials put on by UCSF, a course that is loaded with FDA and industry people both as students and instructors.

    About the same time, I co-authored a report of another small trial, which was published in the Annals of Internal Medicine. The written protocol had called for one-tailed tests, so that's what we reported. The statistical editor balked and in a long phone call I had with him, stated that this journal never allowed such things, because investigators might cheat by claiming a one-tailed hypothesis after seeing that their two-tailed p-value came out between 0.05 and 0.10. But we had a writting protocol! For this particular issue, it made no difference. I just doubled all the p-values in the paper and they were all still below 0.05, so everyone was blissfully happy. So silly.

    And finally ...

    Arguably--Oh, should I bring this up here?--the issue almost goes away when you "go Bayes." (Yes, Jason Connor, I fully 'get it' now and so do my students.) But I'll leave all that for another polemic.

    -Ralph

    -- 
    Ralph O'Brien, PhD
    Professor of Biostatistics
    Interim Director, CCI Statistical Sciences Core
    Dept of Epidemiology and Biostatistics
    Case Western Reserve University
    Office: 216.368.1927; Cell: 216.312.3203

    "Tell me and I forget, teach me and I may remember, involve me and I learn."
            ― Benjamin Franklin








  • 22.  RE:Choosing the number of tails for a test in planning a study

    Posted 11-05-2013 06:23
    I thought that a different kind of example may be interesting to you. Basic science Investigators screening to find new interventions are likely to be a class of clients who are not interested in two-sided tests. Broad examples are screening potential antigens for malaria or potential repellents for mosquitoes. If the Investigator can't show a strong improvement under artificially controlled conditions, the potential intervention isn't worth their effort for future development.

    -------------------------------------------
    Charles White (Chuck)
    CEW Biostatistical Consulting
    -------------------------------------------








  • 23.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:35
    Two-tailed when one-tailed desired: Virtually any time the 2-tailed p falls between .05 and .099999999...All kidding aside, in many of the fields in which I publish, the non-statisticians are extremely suspicious of 1-tailed p-values...and there's a part of me that's okay with the practical implications, which is to be a bit more conservative, when so many other interactions push for being decidedly liberal.

    One-tailed even though two-tailed desired: I would place non-inferiority trials in that domain.

    For example, the let's write a grant for a new surgical technique to fix/replace a ruptured ACL. The current clinical gold standards do a darn good job of recapitulating functionality (see the sports world)...Don't go trying to design a primary aim 1 to demonstrate superiority on strength, etc. However, there are other things going on following those injuries and surgeries for which there IS room for improvement...cartilage health, for one. So, the grant may look like this:

    PA1 - non-inferiority of new vs. old on clinical metrics of ACL.  My technique MAY be same, but I'm not about to power for equivalence...and even if it's superior, it's not likely going to be clinically meaningful...so let's just make sure it's not appreciably worse than the current standard, powered to 80%, 90%, 95%...whatever
    PA2 - superiority of new vs. old on cartilage metrics...



    -------------------------------------------

     Jason T. Machan

    Director, Lifespan Biostatistics Core,

         Lifespan Hospital System

    Research Scientist, Biostatistics, Research

         Rhode Island Hospital

    Assistant Professor, Departments of Orthopaedics and Surgery

         The Warren Alpert Medical School, Brown University

    Director Biostatistics Externship, Adjunct Assistant Professor, Department of Psychology

         University of Rhode Island


    -------------------------------------------








  • 24.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:36
    Sometimes it does matter to consider the problem as one-sided, albeit at the .025 level of significance.  The discrete case comes to mind......

    -------------------------------------------
    Jeffrey Finman
    -------------------------------------------








  • 25.  RE:Choosing the number of tails for a test in planning a study

    Posted 10-28-2013 16:47
    Smoking is one of those exposures found to be protective at times in some osteoarthritis literature, contrary to other epidemiological findings.  

    (If in fact protective findings are false, this may be a result of many reasons: study design, exposure/outcome definitions, lack of info on confounders, etc... but I don't intend to go on a tangent).

    -------------------------------------------
    Emmeline Sangeorzan
    Biostatistician
    Arthritis Research Institute of America
    Clearwater, FL
    -------------------------------------------