Discussion: View Thread

Probability question

  • 1.  Probability question

    Posted 01-11-2015 17:42
    I am stuck on a probability question. I have run two experiments that measure the same thing and have two pvalues. The experiments are independent, so I was thinking that the probability of seeing two pvalues of the sizes that I found is the product of the two pvalues. But minus the log of the product of two uniform random variables, which pvalues are under the null hypothesis, is distributed gamma(2,1), which gives a larger value pvalue than the product of the pvalues. What am I not seeing? ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------


  • 2.  RE: Probability question

    Posted 01-11-2015 18:22
    I think you are attempting to equate Pr(X1>x1 and X2>x2) = (1-x1)(1-x2) (when X1 and X2 are independent uniform random variables) with Pr((1-X1)*(1-X2) > (1-x1)*(1-x2)).

    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------




  • 3.  RE: Probability question

    Posted 01-11-2015 18:27
    I guess I am wondering which would be appropriate as a pvalue to use for combining both tests. ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------


  • 4.  RE: Probability question

    Posted 01-11-2015 18:30
    The first is a calculation of the probability that two independent events occur simultaneously which does not involve any distribution theory; the second is a statement about distribution theory.

    Let T1 and T2 be the two test statistics, with observed values to1 and to2, respectively, where the p-values are p1= P(T1 ge to1), and similarly for p2. The first statement is that P(T1 ge to1 and T2 ge to2)=p1*p2 since T1 and T2 are independent. The value p1*p2 is a realization of the random variable P1*P2, where -log (P1*P2) ~ Gam(2,1) under H0. I'm not sure what p-value you computed based on this distribution but I'm guessing that it is prob [Gamma(2,1)  ge the observed value -log(p1*p2)]; it is not surprising that this is not equal to the observed value of -log(p1*p2).
    The technical details above may not be perfect, but, unless I missed something,  I think you'll get the drift

    David
    -------------------------------------------
    David Bristol
    Statistical Consulting Services
    -------------------------------------------




  • 5.  RE: Probability question

    Posted 01-11-2015 18:42
    ------------------------------------------- Margot Tollefson Consultant Vanward Statistics ------------------------------------------- Thank you David. So I should use the distributional approach?


  • 6.  RE: Probability question

    Posted 01-11-2015 20:17
    Margot,
    I think that the distribution theory should be used and that, up to a transformation, the product of the p-values is the observed value of the test statistic.

    -------------------------------------------
    David Bristol
    Statistical Consulting Services
    -------------------------------------------




  • 7.  RE: Probability question

    Posted 01-11-2015 20:33
    The product of the two p-values does not correspond to a legitimate way to directly combine the two p-values.
     
    If it were the correct approach, then imagine an extension of the approach to 5 experiments each of which with p=0.5 (i.e., absolutely no indication that the null hypothesis is false).  By the obvious extension of the hypothesized approach pcombined=0.5^5=0.031<0.05.  I.e., if you were to combine a series of experiments in the described manner, the p-values would all be <1 and the product of enough of them would eventually be less than any chosen alpha. 
     
    I believe the gamma distribution you described is a reasonable approach.  I don't know if it is optimal in any sense.
     
    Mike Morton


    ------Original Message------

    I am stuck on a probability question. I have run two experiments that measure the same thing and have two pvalues. The experiments are independent, so I was thinking that the probability of seeing two pvalues of the sizes that I found is the product of the two pvalues. But minus the log of the product of two uniform random variables, which pvalues are under the null hypothesis, is distributed gamma(2,1), which gives a larger value pvalue than the product of the pvalues. What am I not seeing? ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------


  • 8.  RE: Probability question

    Posted 01-11-2015 20:39
    This is one of the topics in Meta Analysis and perhaps has several good
    sources. Here's a relevant wiki page

    http://en.wikipedia.org/wiki/Fisher%27s_method

    Nagaraj


    On Sun, January 11, 2015 8:32 pm, Michael Morton via American Statistical
    Association wrote:
    > Please Do Not Forward, use the options to the right to forward or reply.
    >
    > The product of the two p-values does not correspond to a legitimate way to
    > directly combine the two p-values. If it were the correct approach, then
    > imagine an extension of the approach to 5 experiments each of which with
    > p=0.5 (i.e., absolutely no indication that the null hypothesis is false).
    > By the obvious extension of the hypothesized approach
    > pcombined=0.5^5=0.031<0.05. I.e., if you were to combine a series of
    > experiments in the described manner, the p-values would all be <1 and the
    > product of enough of them would eventually be less than any chosen alpha.
    > I believe the gamma distribution you described is a reasonable approach.
    > I don't know if it is optimal in any sense. Mike Morton
    >
    > ------Original Message------
    >
    > I am stuck on a probability question. I have run two experiments that
    > measure the same thing and have two pvalues. The experiments are
    > independent, so I was thinking that the probability of seeing two pvalues
    > of the sizes that I found is the product of the two pvalues. But minus
    > the log of the product of two uniform random variables, which pvalues are
    > under the null hypothesis, is distributed gamma(2,1), which gives a larger
    > value pvalue than the product of the pvalues. What am I not seeing?
    >
    >
    > -------------------------------------------
    > Margot Tollefson
    > Consultant
    > Vanward Statistics
    > -------------------------------------------
    >
    > Reply to Sender :
    > http://community.amstat.org/eGroups/PostReply/?GroupId=1777&SenderKey=e8111491-3f68-4d2f-a618-c147aa6d5967&MID=22278&MDATE=756%253a456466&UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved
    >
    > Reply to eGroup :
    > http://community.amstat.org/eGroups/PostReply/?GroupId=1777&MID=22278&MDATE=756%253a456466&UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved
    >
    >
    >
    > You are subscribed to "Statistical Consulting Section" as
    > nagaraj@umbc.edu. To change your subscriptions, go to
    > http://community.amstat.org/MySubscriptions?a1=1&MDATE=756%253a456466&UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved.
    > To unsubscribe from this community discussion, go to
    > http://community.amstat.org/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved&GroupKey=ac0f6215-000e-4179-801f-d62beb5b8a21.


    --
    Nagaraj K. Neerchal, PhD.
    Professor of Statistics
    Chair, Dept of Math and Stat
    Director, Center for Interdisciplinary Research and Consulting
    UMBC, Baltimore, MD 21250

    Skill in Action is Yoga. (BG 2:50)

    ------Original Message------

    The product of the two p-values does not correspond to a legitimate way to directly combine the two p-values.
     
    If it were the correct approach, then imagine an extension of the approach to 5 experiments each of which with p=0.5 (i.e., absolutely no indication that the null hypothesis is false).  By the obvious extension of the hypothesized approach pcombined=0.5^5=0.031<0.05.  I.e., if you were to combine a series of experiments in the described manner, the p-values would all be <1 and the product of enough of them would eventually be less than any chosen alpha. 
     
    I believe the gamma distribution you described is a reasonable approach.  I don't know if it is optimal in any sense.
     
    Mike Morton




  • 9.  RE: Probability question

    Posted 01-11-2015 21:38
    Thank you David, James, and Nagaraj for your comments. They are helpful. The link that you provided answers my question, Nagaraj. Thank you again. Margot ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------


  • 10.  RE: Probability question

    Posted 01-12-2015 05:47
    If the group's response was helpful for you to find a solution, please post a summary of the solution so others can benefit as well. I'm curious about what you ended up doing.

    Thanks!

    -------------------------------------------
    Eric Vance
    CNSL Chair
    LISA (Virginia Tech's Laboratory for Interdisciplinary Statistical Analysis)
    Director and Associate Research Professor
    Blacksburg VA, United States
    -------------------------------------------




  • 11.  RE: Probability question

    Posted 01-12-2015 12:03
    The consensus is that Fisher's method of combining p values - from the Wikipedia page that Nagaraj sent a link to and the paper that Kevin O'Brien sent a link to - or the method of combining z scores in the Wikipedia link would be appropriate.  Nagaraj's comment that combining pvalues falls under meta-analysis is helpful too.

    Margot

    -------------------------------------------
    Margot Tollefson
    Consultant
    Vanward Statistics
    -------------------------------------------




  • 12.  RE: Probability question

    Posted 01-12-2015 19:46
    I have a question about Fisher's method. For i = 1 to k, let p(i) be a set of p-values arising from k tests. Under a global null hypothesis of no effect among the k tests, both p(i) and 1-p(i) follow the uniform distribution. Consequently, for each i, both -2ln(p(i)) and -2ln(1-p(i)) follow the chi-square distribution with df=2. This leads to my question: when implementing Fisher's method, why do we take the sum of -2ln(p(i)) and not the sum of -2ln(1-p(i))?

    I'm asking because, if x(i) follows a standard exponential distribution, then p(i) = CDF(x(i)) = 1-exp(x(i)), and the inverse transformation is x(i) = -ln(1-p(i)), which points me towards wanting to take the sum of -2ln(1-p(i)) when implementing Fisher's method. Where is my reasoning going awry?


    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences of Biostatistics
    -------------------------------------------




  • 13.  RE: Probability question

    Posted 01-12-2015 19:52
    whoops, I meant to say "p(i) = CDF(x(i)) = 1-exp(-x(i))"

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences of Biostatistics
    -------------------------------------------




  • 14.  RE: Probability question

    Posted 01-16-2015 01:59
    I don't know, so my answer is only a guess: calculating the sum of -2ln(p(i)) eliminates a whole lot of subtractions and, back when Fisher was doing much of his work, this would have been a good saving of time and a reduction in the probability of errors creeping in.

    (Actually, with some of the innumerate students that I have to teach, this would still be a significant consideration!)

    Kind regards,
    Ken Russell


    ------Original Message------

    I have a question about Fisher's method. For i = 1 to k, let p(i) be a set of p-values arising from k tests. Under a global null hypothesis of no effect among the k tests, both p(i) and 1-p(i) follow the uniform distribution. Consequently, for each i, both -2ln(p(i)) and -2ln(1-p(i)) follow the chi-square distribution with df=2. This leads to my question: when implementing Fisher's method, why do we take the sum of -2ln(p(i)) and not the sum of -2ln(1-p(i))?

    I'm asking because, if x(i) follows a standard exponential distribution, then p(i) = CDF(x(i)) = 1-exp(x(i)), and the inverse transformation is x(i) = -ln(1-p(i)), which points me towards wanting to take the sum of -2ln(1-p(i)) when implementing Fisher's method. Where is my reasoning going awry?


    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences of Biostatistics
    -------------------------------------------




  • 15.  RE: Probability question

    Posted 01-12-2015 10:46
    There are many ways to combine p values. The best one would correspond to a level-alpha test with the highest power.  If possible, it would be more straight forward to work with the original test statistics (that produced the p values) rather than an abstract test statistic.

    Philosophically, the final decision should be weighted toward the experiment with a larger sample size. 


    -------------------------------------------
    Qing Kang
    Chief Scientist
    Statistical Intelligence Group, LLC
    -------------------------------------------




  • 16.  RE: Probability question

    Posted 01-12-2015 11:21
    There are several approaches to combing p-values. One was described by Fisher and attached is a link to a paper by R. Elston describing that method. Hope this helps http://darwin.cwru.edu/ref/view.php?id=316&article=Elston+Reprints


    -------------------------------------------
    Kevin O'Brien
    Professor
    East Carolina University
    -------------------------------------------




  • 17.  RE: Probability question

    Posted 01-12-2015 11:33
    I hope that the attached articles will also be helpful on the methods of "combining p-values".

    Kind regards,

    Dr. M. Shakil, MASA


    ------Original Message------

    There are several approaches to combing p-values. One was described by Fisher and attached is a link to a paper by R. Elston describing that method. Hope this helps http://darwin.cwru.edu/ref/view.php?id=316&article=Elston+Reprints


    -------------------------------------------
    Kevin O'Brien
    Professor
    East Carolina University
    -------------------------------------------




  • 18.  RE: Probability question

    Posted 01-16-2015 09:50
    Hi Margot, There might be a different way to look at your problem, which is ultimately connected to these questions: 1. Why was it important for you to run two different experiments as opposed to just one? 2. Why run just 2 experiments instead of n experiments, where n > 2? The different way to approach the problem would be to combine (pool) the estimates of the effect/parameter of interest across the two experiments using somehing akin to a meta-analytic approach. This would enable you to also obtain a p-value. Sometimes it can help to think about a problem differently, so I wanted to throw this in the mix of ideas you received from other colleagues. Isabella ------------------------------------------- Isabella Ghement Ghement Statistical Consulting Company Ltd. -------------------------------------------


  • 19.  RE: Probability question

    Posted 01-16-2015 10:02
    Margot, another important question for your setup is this: Are you interested just in establishing that the effect of interest in your study is significant or do you also need to quantify the magnitude of this effect? If magnitude of effect is on your radar, then the meta-analytical approach I suggested might work. Combining the p-values alone will not give you an indication of the actual magnitude of the effect of interest (if that effect turns out to be significant). Isabella ------------------------------------------- Isabella Ghement Ghement Statistical Consulting Company Ltd. -------------------------------------------


  • 20.  RE: Probability question

    Posted 01-16-2015 15:32
    Isabell,

    Thank you for your comments.

    I had collected a data set two years ago and just finished analysis a few months ago.  The result of my analysis was in the wrong direction from what I had hypothesized and the p-value was close to but not significant.  So I collected a second sample to see if I got the same results - which I pretty much did.  I have considered pooling the two datasets, but I have not done that yet.  Using Fisher's method, I get some results that are significant at the level I use - alpha equal to 0.034 for a one sided test.  The two data sets are independent of each other.  Looking at the results for the two data sets, I am quite certain that what I see in the data is real.  I started out not knowing what to expect.

    Margot 

    -------------------------------------------
    Margot Tollefson
    Consultant
    Vanward Statistics
    -------------------------------------------