ASA Connect

 View Only

Finite Population Correction Factor for Covariance of Two Multinomial Proportions

  • 1.  Finite Population Correction Factor for Covariance of Two Multinomial Proportions

    Posted 11-20-2022 17:00
    Edited by Isabella Ghement 11-20-2022 18:20
    Hi everyone, 

    I am working on a multinomial proportions type of problem and need some help.

    Problem: 

    We sample n items out of a finite population of size N (at random, with replacement) and categorize them into one of 4 categories: C1, C2, C3 or C4.   The counts of n items falling into these categories are x1, x2, x3 and x4, respectively, such that x1 + x2 + x3 + x4 = n.   We can use p1 = x1/n and p2 = x2/n as our estimated proportions of the true proportions P1, P2, P3 and P4 of items falling into the 4 categories in the population, respectively.


    The goal is to construct a confidence interval for the difference in the true proportions P1 - P2 using the difference in the estimated proportions p1 and p2 as well as the estimated standard error of the difference p1 - p2. 

    Because the population is finite, the estimated standard errors associated with p1 and p2 would look like this: 

    SE(p1) = sqrt( p1 *(1-p1)/n1 * (1 - n1/N) )

    SE(p2) = sqrt( p2 *(1-p2)/n2 * (1 - n2/N) )

    The quantities 1 - n1/N and 1 - n2/N represent finite population correction factors; the first quantity is such that SE(p1) is equal to 0 if n1 = N and the second quantity is such that SE(p2) is equal to 0 if n2 = N. (SE equal to 0 means no uncertainty in estimating the population proportion of interest; so these population correction factors force SEs to be zero in cases where one of the first 2 of the 4 categories includes the entire population.)   

    It strikes me that it would actually be easier to have the correction factor depend only on n/N, like so: 

    SE(p1) = sqrt( p1 *(1-p1)/n1 * (1 - n/N) )

    SE(p2) = sqrt( p2 *(1-p2)/n2 * (1 - n/N) )

    where recall that n = n1 + n2 + n3 + n4.

    Question: 

    Because p1, p2, p3 and p4 are assumed to follow a multinomial distribution, we need to compute not just SE(p1) and SE(p2) but also Cov(p1, p2), where Cov stands for covariance. What is the correct way to correct this covariance to account for the fact that we are sampling from a finite population? 


    Partial Answer: 

    The paragraph below gives the formula for the estimated Cov(p1, p2) without a finite population correction: 

    Cov(p1, p2) = -p1*p2/n 

    but mentions that it is possible to apply a finite population correction to this formula.   On the surface of it, I suspect that one could apply a finite population correction like this: 

    Cov(p1, p2) = -p1*p2/n * (1 - n1/N) * (1 - n2/N)

    This type of correction would force the covariance to be 0 when either p1 or p2 would be equal to 1. Would that make sense?

    But this correction is just based on a wild guess.  Can anyone familiar with multinomial proportions confirm what the correct way to apply the finite population correction to Cov(p1,p2) is? 

    Here is the paragraph in question, which comes from page 11 of https://www.canr.msu.edu/qfc/publications/pdf-techreports/2013-techreports/T2013-01.pdf
    :  


    Edit: 

    In thinking about this some more, it strikes me that it would actually be simpler to have the correction factor depend only on n/N, like so: 


    SE(p1) = sqrt( p1 *(1-p1)/n1 * (1 - n/N) )

    SE(p2) = sqrt( p2 *(1-p2)/n2 * (1 - n/N) )

    where recall that n = n1 + n2 + n3 + n4.   Then perhaps we can just have the covariance corrected like this:

    Cov(p1, p2) = -p1*p2/n * (1 - n/N)  ?


    Thank you very much for any tidbits of insight you can throw my way.  It's been a long time since I had to look at covariance formulas. (: 

    Isabella 


    ------------------------------
    Isabella R. Ghement, Ph.D.
    Ghement Statistical Consulting Company Ltd.
    301-7031 Blundell Road
    Richmond, B.C., V6Y 1J5
    Canada

    Tel: 604-767-1250
    Email: isabella@ghement.ca
    ------------------------------