Statistical Consulting Section

View Only

Back to discussions

Expand all | Collapse all

probability calculation question

1. probability calculation question

Recommend
Gabriel Farkas

Blog This
Posted 02-05-2013 21:00
Hi all,

For some reason I can't seem to remember how to address this seemingly basic probability question I'm facing, nor can I find the answer anywhere online, and so I was hoping someone in this group might be able to help me out.

I have a situation where there are 2 sets of events of type 1 or type 2, both of which follow their own binomial distributions, with p1 and p2 (and p1 is not necessarily equal to p2). I need to find the probability of exactly k successes in exactly n1+n2 trials, where success in a type 1 trial is indistinguishable from success in a type 2 trial, for my purposes. I know that in all circumstances k>=min(n1,n2); that is, it's possible for all k successes to have occurred during trials of only one type, although by no means guaranteed, and in most cases there certainly would be m successes in type 1 trials and k-m successes in type 2 trials.

The solution is trivial when k=n1+n2, or when p1=p2, but I was having trouble generalizing it for larger n's. I thought at first this might follow the multinomial distribution or multivariate hypergeometric, but neither of those seem to be what I'm looking for.

If anyone has any guidance on how to tackle this, it would be greatly appreciated.

Thanks,
Gabe Farkas

-------------------------------------------
Gabriel Farkas
-------------------------------------------
2. RE:probability calculation question

Recommend
Margot Tollefson

Blog This
Posted 02-05-2013 21:14
Gabe, I am quite sure that your probability distribution is just the weighted sum of the two individual distributions, with weights n1/(n1+n2) and n2/(n1+n2). Margot ------------------------------------------- Margot Tollefson Owner Vanward Statistical Consulting -------------------------------------------

Original Message
3. RE:probability calculation question

Recommend
Keith Eberhardt

Blog This
Posted 02-06-2013 14:19
Gabe,

Perhaps your problem can be treated as a convolution. If X1 and X2 are the binomial random variables corresponding to (n1, p1) and (n2, p2), respectively, then the total number of successes would be X = X1+X2. If X1 and X2 are independent, then the distribution of X would be the convolution of the two binomial distributions.

Keith
-------------------------------------------
Keith Eberhardt
Statistics, Mondelez International
-------------------------------------------

Original Message
4. RE:probability calculation question

Recommend
Michael Morton

Blog This
Posted 02-06-2013 11:09
I would calculate the probability using the two binomial probabilities and summing all possibilities.

For example, for k=2, the possibilities are x1=0, x2=2; x1=1, x2=1; and x1=2, x2=0.

Compute the sum p1(0)*p2(2) + p1(1)*p2(1) + p1(2)*p2(0).

I'm assuming independence of x1 and x2 and that p1 and p2 are known.

Best regards,
-------------------------------------------
Michael Morton
-------------------------------------------

Original Message
5. RE:probability calculation question

Recommend
Nagaraj Neerchal

Blog This
Posted 02-06-2013 11:48
Could this be an approach?

1. Since all trials are independent, the order in which the n1+n2 trials conducted does not matter.
2. Since we are looking for P( k successes all together), once again the trial numbers yielding
the k successed do not matter
3. Therefore, one could think of this problem as: X1 ~ Bin(n1,p1), X2 ~ Bin(n2, p2), X1 and X2 independent, and
we are looking for P(X1+X2=k), which should be the convolution of the two binomial pmfs.
= \sum{t=0}^{k} P(X1=t).P(X2=k-t).

[I apologize if I simplified the problem too drastically by missing a key feature.]

-------------------------------------------
Nagaraj Neerchal
Professor and Chair
UMBC
-------------------------------------------

Original Message
6. RE:probability calculation question

Recommend
Scott Berry

Blog This
Posted 02-06-2013 11:12
I think you have to write this as the convolution...

Pr(X_1 + X_2 = k) = sum_{i=0:k} Pr(X_1 = i) Pr(X_2=k-i)

Each Pr(*) part is the binomial pmf -- I dont think this reduces down at all (unless p_1 = p_2). Some of these terms become zero when i > n_1 or n-i > n_2, etc... so easy to set up an R function for this, but I dont see any simple analytically formula falling out of this.

-------------------------------------------
Scott Berry
Berry Consultants
-------------------------------------------

Original Message
7. RE:probability calculation question

Recommend
Jim Baldwin

Blog This
Posted 02-06-2013 11:25
Scott has it right. An explicit enumeration is what you'll need to do. (The "outer" function in R would be handy for this.)
If I understand the text description, what you have is the following:
x1 ~ Binomial(n1,p1)
x2 ~ Binomial(n2,p2)
k = x1 + x2
Need conditional distribution k ' k > min(n1,n2).

Jim

-------------------------------------------
James Baldwin
Station Statistician
US Forest Service
-------------------------------------------

Original Message
8. RE:probability calculation question

Recommend
Jim Baldwin

Blog This
Posted 02-06-2013 11:28
Sorry, my modulus sign didn't come through correctly. That should be

Need conditional distribution of k given k > min(n1,n2)

Jim

-------------------------------------------
James Baldwin
Station Statistician
US Forest Service
-------------------------------------------

Original Message
9. RE:probability calculation question

Recommend
Allen Heller

Blog This
Posted 02-06-2013 11:43
Agree with the comments below by Scott and James. The reply that I sent to G Farkis last night was not correct.

The problem as i understand it is to determine the probability of exactly K successes after a total of N1 + N2 trials in which the probability of success for any trial type 1 is p1 and probaility of success for any trial type 2 is p2.

We will define m as the number of successes after N1 type 1 trials, and let K-m be the number of successes after K-m type 2 trials.

The probability of exactly m successes after N1 type one trials is given from the binomial distribution using N1, m, and p1.

The probability of exactly K-m successes after N2 type two trials is given from the binomial distribution using N2, (K-m), and p2.

The probability of exactly K successes after N1 + N2 trials is given by the PRODUCT of the two probabilities above summed up over all interger values of m over the range m = zero to m = K.

-------------------------------------------
Allen Heller
Vice president, Medical Science
Bayer HealthCare Pharmaceuticals
-------------------------------------------

-------------------------------------------
Allen Heller
Vice president, Medical Science
Bayer HealthCare Pharmaceuticals
-------------------------------------------

Original Message
10. RE:probability calculation question

Recommend
Jim Baldwin

Blog This
Posted 02-06-2013 12:32
Here is some R code to do this.

# Binomial problem
# Set parameter values
n1 = 25
n2 = 12
p1 = 0.5
p2 = 0.3
# Possible values for x1 and x2
x1 = c(0:n1)
x2 = c(0:n2)
# Joint probability function of x1 and x2
pr = outer(dbinom(x1,n1,p1),dbinom(x2,n2,p2))
x1plusx2 = outer(x1,x2,"+")
# Now calculate conditional distribution of k
# where k = x1 + x2 given that k >= min(n1,n2)
k = c(min(n1,n2):(n1+n2))
pr.k = NULL
for (i in k) {
pr.k[i-min(n1,n2)+1] = sum(pr[x1plusx2==i])/sum(pr[x1plusx2>=min(n1,n2)])
}
# Show some results
cbind(k,pr.k)
plot(k,pr.k)

Jim

-------------------------------------------
James Baldwin
Station Statistician
US Forest Service
-------------------------------------------

Original Message
11. RE:probability calculation question

Recommend
Jeffrey Finman

Blog This
Posted 02-06-2013 11:35
Are the original conditions as stated correct?

The text seems to imply it is possible for k to equal X1 (if X2 is zero) or vice versa (which presumably would be the default understanding), whereas the condition is stated as k>=min(n1, n2), which is an entirely different matter.

-------------------------------------------
Jeffrey Finman Ph.D.
Jupiter Point Pharma Consulting, LLC
-------------------------------------------

Original Message
12. RE:probability calculation question

Recommend
Jim Baldwin

Blog This
Posted 02-06-2013 11:38
Yes. I'm not having a good keyboard day today. It should be k >= min(n1,n2) rather than k > min(n1,n2).

Jim

-------------------------------------------
James Baldwin
Station Statistician
US Forest Service
-------------------------------------------

Original Message
13. RE:probability calculation question

Recommend
Gabriel Farkas

Blog This
Posted 02-06-2013 13:18
Thanks everyone for all the great feedback! I guess I feel a little better that it's not just me getting tripped up by this.

Regarding Michael Morton's suggestion to "calculate the probability using the two binomial probabilities and summing all possibilities", that is what I originally thought of too, and I believe that is correct. However, that also requires manual calculations for each combination of k/n1/n2, and I was trying to figure out if there was a generalizable solution. And yes, there is independence of x1 and x2, and p1 and p2 are both known.

Regarding Jeffrey Finman's question about the original conditions, he brings up a good point. Please let me clarify what I meant in my original e-mail. When I wrote that "k>=min(n1,n2)", it was more an observation about the particulars of my dataset and the desired values of k. Strictly speaking, given the data I have, k could be (and sometimes is) any number between 0 and n1+n2. However, for the purposes of my particular analysis, the desired values of k for which I need to do the calculation always happen to meet the condition "k>=min(n1,n2)". But yes, in general it is possible for k to equal X1 (if X2 is zero) or vice versa.

Regarding Margot's original suggestion to take the weighted sum of the two individual distributions, with weights n1/(n1+n2) and n2/(n1+n2), when I tried that I found something interesting. Compared to the manual calculations I did for a few cases, when p1=p2 her solution matched the manual summation exactly. However, when p1-p2=0.05, the weighting solution started to deviate from the manual summation by somewhere in the neighborhood of 0.0005 to 0.001, for the cases I tested of n's in the range of 2 to 4.

Also, regarding the suggestions to sum over all possibilities, that was my first thought as well, but I was hoping there was some distribution or mathematical identity or something like else similar that would allow for generalizing the solution without having to specify the lower and upper bounds of summation for each case. If not, then I guess I'll have to approach the problem differently.

Thanks again,
Gabe

-------------------------------------------
Gabriel Farkas
-------------------------------------------

Original Message
14. RE:probability calculation question

Recommend
Jim Baldwin

Blog This
Posted 02-06-2013 14:31
| view attached (3)
So it is the distribution of the sum of two binomial random variables. I've modified the R code I sent earlier to calculate that (rather than the sum conditional on the sum being at least min(n1,n2)) and also attached a Wolfram CDF document and a screenshot where you can move sliders to examine the distribution for different combinations of n1, n2, p1, and p2.

Jim

-------------------------------------------
James Baldwin
Station Statistician
US Forest Service
-------------------------------------------

Attachment(s)

sum of two binomials1.cdf 12 KB 1 version

sum of two binomials1.r 467 B 1 version

Original Message
15. RE:probability calculation question

Recommend
Gabriel Farkas

Blog This
Posted 02-07-2013 21:44
For anyone still interested, I ended up needing to set up a partially-manual and partially-iterative calculation for each case of combinations of k/n1/n2 that arise in the data.

It looks like what I was after is a "Poisson Binomial Distribution". A decent reference I found is:
http://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n44.pdf

Thanks again to everyone for the great suggestions and ideas, they definitely were helpful!

-------------------------------------------
Gabriel Farkas
-------------------------------------------

Original Message
16. RE:probability calculation question

Recommend
Patrick Spagon

Blog This
Posted 02-06-2013 15:27
Hi,

I haven't had a chance to read all the posts, but I find the particular article interesting regarding the bivariate binomial distribution.

"A new bivariate binomial distribution"

http://www.sciencedirect.com/science/article/pii/S0167715202003231

-------------------------------------------
Patrick Spagon
-------------------------------------------

Original Message
17. RE:probability calculation question

Recommend
Margot Tollefson

Blog This
Posted 02-06-2013 12:02
Scott and James,

Each observation comes from one or the other group. You are describing a joint probability for two binomial variables. You are right though that you need to sum over the possible ways to get k.

Mragot

-------------------------------------------
Margot Tollefson
Owner
Vanward Statistical Consulting
-------------------------------------------

Original Message

Statistical Consulting Section

probability calculation question

Gabriel Farkas02-05-2013 21:00

Margot Tollefson02-05-2013 21:14

Keith Eberhardt02-06-2013 14:19

Michael Morton02-06-2013 11:09

Nagaraj Neerchal02-06-2013 11:48

Scott Berry02-06-2013 11:12

Jim Baldwin02-06-2013 11:25

Jim Baldwin02-06-2013 11:28

Allen Heller02-06-2013 11:43

Jim Baldwin02-06-2013 12:32

Jeffrey Finman02-06-2013 11:35

Jim Baldwin02-06-2013 11:38

Gabriel Farkas02-06-2013 13:18

Jim Baldwin02-06-2013 14:31

Gabriel Farkas02-07-2013 21:44

Patrick Spagon02-06-2013 15:27

Margot Tollefson02-06-2013 12:02

1. probability calculation question

2. RE:probability calculation question

3. RE:probability calculation question

4. RE:probability calculation question

5. RE:probability calculation question

6. RE:probability calculation question

7. RE:probability calculation question

8. RE:probability calculation question

9. RE:probability calculation question

10. RE:probability calculation question

11. RE:probability calculation question

12. RE:probability calculation question

13. RE:probability calculation question

14. RE:probability calculation question

15. RE:probability calculation question

16. RE:probability calculation question

17. RE:probability calculation question