Statistical Consulting Section

 View Only
Expand all | Collapse all

probability calculation question

  • 1.  probability calculation question

    Posted 02-05-2013 21:00
    Hi all,

    For some reason I can't seem to remember how to address this seemingly basic probability question I'm facing, nor can I find the answer anywhere online, and so I was hoping someone in this group might be able to help me out.

    I have a situation where there are 2 sets of events of type 1 or type 2, both of which follow their own binomial distributions, with p1 and p2 (and p1 is not necessarily equal to p2). I need to find the probability of exactly k successes in exactly n1+n2 trials, where success in a type 1 trial is indistinguishable from success in a type 2 trial, for my purposes. I know that in all circumstances k>=min(n1,n2); that is, it's possible for all k successes to have occurred during trials of only one type, although by no means guaranteed, and in most cases there certainly would be m successes in type 1 trials and k-m successes in type 2 trials.

    The solution is trivial when k=n1+n2, or when p1=p2, but I was having trouble generalizing it for larger n's. I thought at first this might follow the multinomial distribution or multivariate hypergeometric, but neither of those seem to be what I'm looking for.

    If anyone has any guidance on how to tackle this, it would be greatly appreciated.

    Thanks,
    Gabe Farkas


    -------------------------------------------
    Gabriel Farkas
    -------------------------------------------


  • 2.  RE:probability calculation question

    Posted 02-05-2013 21:14
    Gabe, I am quite sure that your probability distribution is just the weighted sum of the two individual distributions, with weights n1/(n1+n2) and n2/(n1+n2). Margot ------------------------------------------- Margot Tollefson Owner Vanward Statistical Consulting -------------------------------------------


  • 3.  RE:probability calculation question

    Posted 02-06-2013 14:19

    Gabe,

    Perhaps your problem can be treated as a convolution. If X1 and X2 are the binomial random variables corresponding to (n1, p1) and (n2, p2), respectively, then the total number of successes would be X = X1+X2. If X1 and X2 are independent, then the distribution of X would be the convolution of the two binomial distributions.

    Keith
    -------------------------------------------
    Keith Eberhardt
    Statistics, Mondelez International
    -------------------------------------------








  • 4.  RE:probability calculation question

    Posted 02-06-2013 11:09

    I would calculate the probability using the two binomial probabilities and summing all possibilities.

    For example, for k=2, the possibilities are x1=0, x2=2; x1=1, x2=1; and x1=2, x2=0.

    Compute the sum p1(0)*p2(2) + p1(1)*p2(1) + p1(2)*p2(0).

    I'm assuming independence of x1 and x2 and that p1 and p2 are known.

    Best regards,
    -------------------------------------------
    Michael Morton
    -------------------------------------------








  • 5.  RE:probability calculation question

    Posted 02-06-2013 11:48


    Could this be an approach?

    1. Since all trials are independent, the order in which the n1+n2 trials conducted does not matter.
    2. Since we are looking for P( k successes all together), once again the trial numbers yielding
     the k successed do not matter
    3. Therefore,  one could think of this problem as:  X1 ~ Bin(n1,p1), X2 ~ Bin(n2, p2), X1 and X2 independent, and
        we are looking for P(X1+X2=k), which should be the convolution of the two binomial pmfs.
               = \sum{t=0}^{k}  P(X1=t).P(X2=k-t).

    [I apologize if I simplified the problem too drastically by missing a key feature.]

    -------------------------------------------
    Nagaraj Neerchal
    Professor and Chair
    UMBC
    -------------------------------------------








  • 6.  RE:probability calculation question

    Posted 02-06-2013 11:12
    I think you have to write this as the convolution...

    Pr(X_1 + X_2 = k) = sum_{i=0:k}  Pr(X_1 = i) Pr(X_2=k-i)

    Each Pr(*) part is the binomial pmf -- I dont think this reduces down at all (unless p_1 = p_2).  Some of these terms become zero when i > n_1 or n-i > n_2, etc... so easy to set up an R function for this, but I dont see any simple analytically formula falling out of this.  

    -------------------------------------------
    Scott Berry
    Berry Consultants
    -------------------------------------------








  • 7.  RE:probability calculation question

    Posted 02-06-2013 11:25

    Scott has it right.  An explicit enumeration is what you'll need to do.  (The "outer" function in R would be handy for this.)

    If I understand the text description, what you have is the following:

      x1 ~ Binomial(n1,p1)
      x2 ~ Binomial(n2,p2)
      k = x1 + x2

      Need conditional distribution k ' k > min(n1,n2).

    Jim

    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------








  • 8.  RE:probability calculation question

    Posted 02-06-2013 11:28
    Sorry, my modulus sign didn't come through correctly.  That should be

        Need conditional distribution of k given k > min(n1,n2)

    Jim

    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------








  • 9.  RE:probability calculation question

    Posted 02-06-2013 11:43
    Agree with the comments below by Scott and James.  The reply that I sent to G Farkis last night was not correct.

    The problem as i understand it is to determine the probability of exactly K successes after a total of N1 + N2 trials in which the probability of success for any trial type 1 is p1 and probaility of success for any trial type 2 is p2.

    We will define m as the number of successes after N1 type 1 trials, and let K-m be the number of successes after K-m type 2 trials.

    The probability of exactly m successes after N1 type one trials is given from the binomial distribution using N1, m, and p1.

    The probability of exactly K-m successes after N2 type two trials is given from the binomial distribution using N2, (K-m), and p2.


    The probability of exactly K successes after N1 + N2 trials is given by the PRODUCT of the two probabilities above summed up over all interger values of m over the range m = zero to m = K.

    -------------------------------------------
    Allen Heller
    Vice president, Medical Science
    Bayer HealthCare Pharmaceuticals
    -------------------------------------------




    -------------------------------------------
    Allen Heller
    Vice president, Medical Science
    Bayer HealthCare Pharmaceuticals
    -------------------------------------------








  • 10.  RE:probability calculation question

    Posted 02-06-2013 12:32

    Here is some R code to do this.

    # Binomial problem

    # Set parameter values
    n1 = 25
    n2 = 12
    p1 = 0.5
    p2 = 0.3

    # Possible values for x1 and x2
    x1 = c(0:n1)
    x2 = c(0:n2)

    # Joint probability function of x1 and x2
    pr = outer(dbinom(x1,n1,p1),dbinom(x2,n2,p2))
    x1plusx2 = outer(x1,x2,"+")

    # Now calculate conditional distribution of k
    # where k = x1 + x2 given that k >= min(n1,n2)
    k = c(min(n1,n2):(n1+n2))
    pr.k = NULL
    for (i in k) {
        pr.k[i-min(n1,n2)+1] = sum(pr[x1plusx2==i])/sum(pr[x1plusx2>=min(n1,n2)])
    }

    # Show some results
    cbind(k,pr.k)
    plot(k,pr.k)

     

    Jim

    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------








  • 11.  RE:probability calculation question

    Posted 02-06-2013 11:35
    Are the original conditions as stated correct?

    The text seems to imply it is possible for k to equal X1 (if X2 is zero) or vice versa (which presumably would be the default understanding), whereas the condition is stated as k>=min(n1, n2), which is an entirely different matter.



    -------------------------------------------
    Jeffrey Finman Ph.D.
    Jupiter Point Pharma Consulting, LLC
    -------------------------------------------








  • 12.  RE:probability calculation question

    Posted 02-06-2013 11:38
    Yes.  I'm not having a good keyboard day today.  It should be k >= min(n1,n2) rather than k > min(n1,n2).

    Jim

    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------








  • 13.  RE:probability calculation question

    Posted 02-06-2013 13:18
    Thanks everyone for all the great feedback! I guess I feel a little better that it's not just me getting tripped up by this.

    Regarding Michael Morton's suggestion to "calculate the probability using the two binomial probabilities and summing all possibilities", that is what I originally thought of too, and I believe that is correct. However, that also requires manual calculations for each combination of k/n1/n2, and I was trying to figure out if there was a generalizable solution. And yes, there is independence of x1 and x2, and p1 and p2 are both known.

    Regarding Jeffrey Finman's question about the original conditions, he brings up a good point. Please let me clarify what I meant in my original e-mail. When I wrote that "k>=min(n1,n2)", it was more an observation about the particulars of my dataset and the desired values of k. Strictly speaking, given the data I have, k could be (and sometimes is) any number between 0 and n1+n2. However, for the purposes of my particular analysis, the desired values of k for which I need to do the calculation always happen to meet the condition "k>=min(n1,n2)". But yes, in general it is possible for k to equal X1 (if X2 is zero) or vice versa.

    Regarding Margot's original suggestion to take the weighted sum of the two individual distributions, with weights n1/(n1+n2) and n2/(n1+n2), when I tried that I found something interesting. Compared to the manual calculations I did for a few cases, when p1=p2 her solution matched the manual summation exactly. However, when p1-p2=0.05, the weighting solution started to deviate from the manual summation by somewhere in the neighborhood of 0.0005 to 0.001, for the cases I tested of n's in the range of 2 to 4.

    Also, regarding the suggestions to sum over all possibilities, that was my first thought as well, but I was hoping there was some distribution or mathematical identity or something like else similar that would allow for generalizing the solution without having to specify the lower and upper bounds of summation for each case. If not, then I guess I'll have to approach the problem differently.


    Thanks again,
    Gabe



    -------------------------------------------
    Gabriel Farkas
    -------------------------------------------








  • 14.  RE:probability calculation question

    Posted 02-06-2013 14:31
    So it is the distribution of the sum of two binomial random variables.  I've modified the R code I sent earlier to calculate that (rather than the sum conditional on the sum being at least min(n1,n2)) and also attached a Wolfram CDF document and a screenshot where you can move sliders to examine the distribution for different combinations of n1, n2, p1, and p2.

    Jim


    -------------------------------------------
    James Baldwin
    Station Statistician
    US Forest Service
    -------------------------------------------






    Attachment(s)

    cdf
    sum of two binomials1.cdf   12 KB 1 version
    r
    sum of two binomials1.r   467 B 1 version


  • 15.  RE:probability calculation question

    Posted 02-07-2013 21:44
    For anyone still interested, I ended up needing to set up a partially-manual and partially-iterative calculation for each case of combinations of k/n1/n2 that arise in the data.

    It looks like what I was after is a "Poisson Binomial Distribution". A decent reference I found is:
    http://www3.stat.sinica.edu.tw/statistica/oldpdf/A7n44.pdf

    Thanks again to everyone for the great suggestions and ideas, they definitely were helpful!

    -------------------------------------------
    Gabriel Farkas
    -------------------------------------------








  • 16.  RE:probability calculation question

    Posted 02-06-2013 15:27
    Hi,

    I haven't had a chance to read all the posts, but I find the particular article interesting regarding the bivariate binomial distribution.

    "A new bivariate binomial distribution"

    http://www.sciencedirect.com/science/article/pii/S0167715202003231

    -------------------------------------------
    Patrick Spagon
    -------------------------------------------








  • 17.  RE:probability calculation question

    Posted 02-06-2013 12:02
    Scott and James,

    Each observation comes from one or the other group.  You are describing a joint probability for two binomial variables.  You are right though that you need  to sum over the possible ways to get k.

    Mragot

    -------------------------------------------
    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------