Discussion: View Thread

Back to discussions

Expand all | Collapse all

Probability question

1. Probability question

Recommend
Margot Tollefson
Posted 01-11-2015 17:42
I am stuck on a probability question. I have run two experiments that measure the same thing and have two pvalues. The experiments are independent, so I was thinking that the probability of seeing two pvalues of the sizes that I found is the product of the two pvalues. But minus the log of the product of two uniform random variables, which pvalues are under the null hypothesis, is distributed gamma(2,1), which gives a larger value pvalue than the product of the pvalues. What am I not seeing? ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------
2. RE: Probability question

Recommend
Jim Baldwin
Posted 01-11-2015 18:22
I think you are attempting to equate Pr(X1>x1 and X2>x2) = (1-x1)(1-x2) (when X1 and X2 are independent uniform random variables) with Pr((1-X1)*(1-X2) > (1-x1)*(1-x2)).

-------------------------------------------
James Baldwin
Station Statistician
US Forest Service
-------------------------------------------
3. RE: Probability question

Recommend
Margot Tollefson
Posted 01-11-2015 18:27
I guess I am wondering which would be appropriate as a pvalue to use for combining both tests. ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------

Original Message
4. RE: Probability question

Recommend
David Bristol
Posted 01-11-2015 18:30
The first is a calculation of the probability that two independent events occur simultaneously which does not involve any distribution theory; the second is a statement about distribution theory.

Let T1 and T2 be the two test statistics, with observed values to1 and to2, respectively, where the p-values are p1= P(T1 ge to1), and similarly for p2. The first statement is that P(T1 ge to1 and T2 ge to2)=p1*p2 since T1 and T2 are independent. The value p1*p2 is a realization of the random variable P1*P2, where -log (P1*P2) ~ Gam(2,1) under H0. I'm not sure what p-value you computed based on this distribution but I'm guessing that it is prob [Gamma(2,1) ge the observed value -log(p1*p2)]; it is not surprising that this is not equal to the observed value of -log(p1*p2).
The technical details above may not be perfect, but, unless I missed something, I think you'll get the drift

David
-------------------------------------------
David Bristol
Statistical Consulting Services
-------------------------------------------
5. RE: Probability question

Recommend
Margot Tollefson
Posted 01-11-2015 18:42
------------------------------------------- Margot Tollefson Consultant Vanward Statistics ------------------------------------------- Thank you David. So I should use the distributional approach?

Original Message
6. RE: Probability question

Recommend
David Bristol
Posted 01-11-2015 20:17
Margot,
I think that the distribution theory should be used and that, up to a transformation, the product of the p-values is the observed value of the test statistic.

-------------------------------------------
David Bristol
Statistical Consulting Services
-------------------------------------------

Original Message
7. RE: Probability question

Recommend
Michael Morton
Posted 01-11-2015 20:33
The product of the two p-values does not correspond to a legitimate way to directly combine the two p-values.

If it were the correct approach, then imagine an extension of the approach to 5 experiments each of which with p=0.5 (i.e., absolutely no indication that the null hypothesis is false). By the obvious extension of the hypothesized approach pcombined=0.5^5=0.031<0.05. I.e., if you were to combine a series of experiments in the described manner, the p-values would all be <1 and the product of enough of them would eventually be less than any chosen alpha.

I believe the gamma distribution you described is a reasonable approach. I don't know if it is optimal in any sense.

Mike Morton

------Original Message------

I am stuck on a probability question. I have run two experiments that measure the same thing and have two pvalues. The experiments are independent, so I was thinking that the probability of seeing two pvalues of the sizes that I found is the product of the two pvalues. But minus the log of the product of two uniform random variables, which pvalues are under the null hypothesis, is distributed gamma(2,1), which gives a larger value pvalue than the product of the pvalues. What am I not seeing? ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------
8. RE: Probability question

Recommend
Nagaraj Neerchal
Posted 01-11-2015 20:39
This is one of the topics in Meta Analysis and perhaps has several good
sources. Here's a relevant wiki page

http://en.wikipedia.org/wiki/Fisher%27s_method

Nagaraj

On Sun, January 11, 2015 8:32 pm, Michael Morton via American Statistical
Association wrote:
> Please Do Not Forward, use the options to the right to forward or reply.
>
> The product of the two p-values does not correspond to a legitimate way to
> directly combine the two p-values. If it were the correct approach, then
> imagine an extension of the approach to 5 experiments each of which with
> p=0.5 (i.e., absolutely no indication that the null hypothesis is false).
> By the obvious extension of the hypothesized approach
> pcombined=0.5^5=0.031<0.05. I.e., if you were to combine a series of
> experiments in the described manner, the p-values would all be <1 and the
> product of enough of them would eventually be less than any chosen alpha.
> I believe the gamma distribution you described is a reasonable approach.
> I don't know if it is optimal in any sense. Mike Morton
>
> ------Original Message------
>
> I am stuck on a probability question. I have run two experiments that
> measure the same thing and have two pvalues. The experiments are
> independent, so I was thinking that the probability of seeing two pvalues
> of the sizes that I found is the product of the two pvalues. But minus
> the log of the product of two uniform random variables, which pvalues are
> under the null hypothesis, is distributed gamma(2,1), which gives a larger
> value pvalue than the product of the pvalues. What am I not seeing?
>
>
> -------------------------------------------
> Margot Tollefson
> Consultant
> Vanward Statistics
> -------------------------------------------
>
> Reply to Sender :
> http://community.amstat.org/eGroups/PostReply/?GroupId=1777&SenderKey=e8111491-3f68-4d2f-a618-c147aa6d5967&MID=22278&MDATE=756%253a456466&UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved
>
> Reply to eGroup :
> http://community.amstat.org/eGroups/PostReply/?GroupId=1777&MID=22278&MDATE=756%253a456466&UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved
>
>
>
> You are subscribed to "Statistical Consulting Section" as
> nagaraj@umbc.edu. To change your subscriptions, go to
> http://community.amstat.org/MySubscriptions?a1=1&MDATE=756%253a456466&UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved.
> To unsubscribe from this community discussion, go to
> http://community.amstat.org/HigherLogic/eGroups/Unsubscribe.aspx?UserKey=e0f57575-cd1f-48ab-b950-0b80e1843438&sKey=KeyRemoved&GroupKey=ac0f6215-000e-4179-801f-d62beb5b8a21.

--
Nagaraj K. Neerchal, PhD.
Professor of Statistics
Chair, Dept of Math and Stat
Director, Center for Interdisciplinary Research and Consulting
UMBC, Baltimore, MD 21250

Skill in Action is Yoga. (BG 2:50)

------Original Message------

The product of the two p-values does not correspond to a legitimate way to directly combine the two p-values.

If it were the correct approach, then imagine an extension of the approach to 5 experiments each of which with p=0.5 (i.e., absolutely no indication that the null hypothesis is false). By the obvious extension of the hypothesized approach pcombined=0.5^5=0.031<0.05. I.e., if you were to combine a series of experiments in the described manner, the p-values would all be <1 and the product of enough of them would eventually be less than any chosen alpha.

I believe the gamma distribution you described is a reasonable approach. I don't know if it is optimal in any sense.

Mike Morton
9. RE: Probability question

Recommend
Margot Tollefson
Posted 01-11-2015 21:38
Thank you David, James, and Nagaraj for your comments. They are helpful. The link that you provided answers my question, Nagaraj. Thank you again. Margot ------------------------------------------- Margot Tollefson Consultant Vanward Statistics -------------------------------------------

Original Message
10. RE: Probability question

Recommend
Eric Vance
Posted 01-12-2015 05:47
If the group's response was helpful for you to find a solution, please post a summary of the solution so others can benefit as well. I'm curious about what you ended up doing.

Thanks!

-------------------------------------------
Eric Vance
CNSL Chair
LISA (Virginia Tech's Laboratory for Interdisciplinary Statistical Analysis)
Director and Associate Research Professor
Blacksburg VA, United States
-------------------------------------------

Original Message
11. RE: Probability question

Recommend
Margot Tollefson
Posted 01-12-2015 12:03
The consensus is that Fisher's method of combining p values - from the Wikipedia page that Nagaraj sent a link to and the paper that Kevin O'Brien sent a link to - or the method of combining z scores in the Wikipedia link would be appropriate. Nagaraj's comment that combining pvalues falls under meta-analysis is helpful too.

Margot

-------------------------------------------
Margot Tollefson
Consultant
Vanward Statistics
-------------------------------------------

Original Message
12. RE: Probability question

Recommend
Eric Siegel
Posted 01-12-2015 19:46
I have a question about Fisher's method. For i = 1 to k, let p(i) be a set of p-values arising from k tests. Under a global null hypothesis of no effect among the k tests, both p(i) and 1-p(i) follow the uniform distribution. Consequently, for each i, both -2ln(p(i)) and -2ln(1-p(i)) follow the chi-square distribution with df=2. This leads to my question: when implementing Fisher's method, why do we take the sum of -2ln(p(i)) and not the sum of -2ln(1-p(i))?

I'm asking because, if x(i) follows a standard exponential distribution, then p(i) = CDF(x(i)) = 1-exp(x(i)), and the inverse transformation is x(i) = -ln(1-p(i)), which points me towards wanting to take the sum of -2ln(1-p(i)) when implementing Fisher's method. Where is my reasoning going awry?

-------------------------------------------
Eric Siegel
Biostatistician
Univ of Arkansas for Medical Sciences of Biostatistics
-------------------------------------------

Original Message
13. RE: Probability question

Recommend
Eric Siegel
Posted 01-12-2015 19:52
whoops, I meant to say "p(i) = CDF(x(i)) = 1-exp(-x(i))"

-------------------------------------------
Eric Siegel
Biostatistician
Univ of Arkansas for Medical Sciences of Biostatistics
-------------------------------------------

Original Message
14. RE: Probability question

Recommend
Kenneth Russell
Posted 01-16-2015 01:59
I don't know, so my answer is only a guess: calculating the sum of -2ln(p(i)) eliminates a whole lot of subtractions and, back when Fisher was doing much of his work, this would have been a good saving of time and a reduction in the probability of errors creeping in.

(Actually, with some of the innumerate students that I have to teach, this would still be a significant consideration!)

Kind regards,
Ken Russell

------Original Message------

I have a question about Fisher's method. For i = 1 to k, let p(i) be a set of p-values arising from k tests. Under a global null hypothesis of no effect among the k tests, both p(i) and 1-p(i) follow the uniform distribution. Consequently, for each i, both -2ln(p(i)) and -2ln(1-p(i)) follow the chi-square distribution with df=2. This leads to my question: when implementing Fisher's method, why do we take the sum of -2ln(p(i)) and not the sum of -2ln(1-p(i))?

I'm asking because, if x(i) follows a standard exponential distribution, then p(i) = CDF(x(i)) = 1-exp(x(i)), and the inverse transformation is x(i) = -ln(1-p(i)), which points me towards wanting to take the sum of -2ln(1-p(i)) when implementing Fisher's method. Where is my reasoning going awry?

-------------------------------------------
Eric Siegel
Biostatistician
Univ of Arkansas for Medical Sciences of Biostatistics
-------------------------------------------

Original Message
15. RE: Probability question

Recommend
Qing Kang
Posted 01-12-2015 10:46
There are many ways to combine p values. The best one would correspond to a level-alpha test with the highest power. If possible, it would be more straight forward to work with the original test statistics (that produced the p values) rather than an abstract test statistic.

Philosophically, the final decision should be weighted toward the experiment with a larger sample size.

-------------------------------------------
Qing Kang
Chief Scientist
Statistical Intelligence Group, LLC
-------------------------------------------

Original Message
16. RE: Probability question

Recommend
Kevin O'Brien
Posted 01-12-2015 11:21
There are several approaches to combing p-values. One was described by Fisher and attached is a link to a paper by R. Elston describing that method. Hope this helps http://darwin.cwru.edu/ref/view.php?id=316&article=Elston+Reprints

-------------------------------------------
Kevin O'Brien
Professor
East Carolina University
-------------------------------------------

Original Message
17. RE: Probability question

Recommend
Mohammad Shakil
Posted 01-12-2015 11:33
| view attached (3)
I hope that the attached articles will also be helpful on the methods of "combining p-values".

Kind regards,

Dr. M. Shakil, MASA

------Original Message------

There are several approaches to combing p-values. One was described by Fisher and attached is a link to a paper by R. Elston describing that method. Hope this helps http://darwin.cwru.edu/ref/view.php?id=316&article=Elston+Reprints

-------------------------------------------
Kevin O'Brien
Professor
East Carolina University
-------------------------------------------

Attachment(s)

Art Owen, KARL PEARSON’S META-ANALYSIS REVISITED, The Ann...pdf 275 KB 1 version

Combining p-values in large scale genomics experiments.pdf 494 KB 1 version

Truncated Product Method for Combining p-values.pdf 218 KB 1 version

Original Message
18. RE: Probability question

Recommend
Isabella Ghement
Posted 01-16-2015 09:50
Hi Margot, There might be a different way to look at your problem, which is ultimately connected to these questions: 1. Why was it important for you to run two different experiments as opposed to just one? 2. Why run just 2 experiments instead of n experiments, where n > 2? The different way to approach the problem would be to combine (pool) the estimates of the effect/parameter of interest across the two experiments using somehing akin to a meta-analytic approach. This would enable you to also obtain a p-value. Sometimes it can help to think about a problem differently, so I wanted to throw this in the mix of ideas you received from other colleagues. Isabella ------------------------------------------- Isabella Ghement Ghement Statistical Consulting Company Ltd. -------------------------------------------

Original Message
19. RE: Probability question

Recommend
Isabella Ghement
Posted 01-16-2015 10:02
Margot, another important question for your setup is this: Are you interested just in establishing that the effect of interest in your study is significant or do you also need to quantify the magnitude of this effect? If magnitude of effect is on your radar, then the meta-analytical approach I suggested might work. Combining the p-values alone will not give you an indication of the actual magnitude of the effect of interest (if that effect turns out to be significant). Isabella ------------------------------------------- Isabella Ghement Ghement Statistical Consulting Company Ltd. -------------------------------------------

Original Message
20. RE: Probability question

Recommend
Margot Tollefson
Posted 01-16-2015 15:32
Isabell,

Thank you for your comments.

I had collected a data set two years ago and just finished analysis a few months ago. The result of my analysis was in the wrong direction from what I had hypothesized and the p-value was close to but not significant. So I collected a second sample to see if I got the same results - which I pretty much did. I have considered pooling the two datasets, but I have not done that yet. Using Fisher's method, I get some results that are significant at the level I use - alpha equal to 0.034 for a one sided test. The two data sets are independent of each other. Looking at the results for the two data sets, I am quite certain that what I see in the data is real. I started out not knowing what to expect.

Margot

-------------------------------------------
Margot Tollefson
Consultant
Vanward Statistics
-------------------------------------------

Original Message

Discussion: View Thread

Probability question

Margot Tollefson01-11-2015 17:42

Jim Baldwin01-11-2015 18:22

Margot Tollefson01-11-2015 18:27

David Bristol01-11-2015 18:30

Margot Tollefson01-11-2015 18:42

David Bristol01-11-2015 20:17

Michael Morton01-11-2015 20:33

Nagaraj Neerchal01-11-2015 20:39

Margot Tollefson01-11-2015 21:38

Eric Vance01-12-2015 05:47

Margot Tollefson01-12-2015 12:03

Eric Siegel01-12-2015 19:46

Eric Siegel01-12-2015 19:52

Kenneth Russell01-16-2015 01:59

Qing Kang01-12-2015 10:46

Kevin O'Brien01-12-2015 11:21

Mohammad Shakil01-12-2015 11:33

Isabella Ghement01-16-2015 09:50

Isabella Ghement01-16-2015 10:02

Margot Tollefson01-16-2015 15:32

1. Probability question

2. RE: Probability question

3. RE: Probability question

4. RE: Probability question

5. RE: Probability question

6. RE: Probability question

7. RE: Probability question

8. RE: Probability question

9. RE: Probability question

10. RE: Probability question

11. RE: Probability question

12. RE: Probability question

13. RE: Probability question

14. RE: Probability question

15. RE: Probability question

16. RE: Probability question

17. RE: Probability question

18. RE: Probability question

19. RE: Probability question

20. RE: Probability question