Hi everyone, I am working on a multinomial proportions type of problem and need some help.

**Problem: **

We sample n items out of a finite population of size N (at random, with replacement) and categorize them into one of 4 categories: C1, C2, C3 or C4. The counts of n items falling into these categories are x1, x2, x3 and x4, respectively, such that x1 + x2 + x3 + x4 = n. We can use p1 = x1/n and p2 = x2/n as our estimated proportions of the true proportions P1, P2, P3 and P4 of items falling into the 4 categories in the population, respectively.The goal is to construct a confidence interval for the difference in the true proportions P1 - P2 using the difference in the estimated proportions p1 and p2 as well as the estimated standard error of the difference p1 - p2. Because the population is finite, the estimated standard errors associated with p1 and p2 would look like this: SE(p1) = sqrt( p1 *(1-p1)/n1 * (1 - n1/N) )

SE(p2) = sqrt( p2 *(1-p2)/n2 * (1 - n2/N) )The quantities 1 - n1/N and 1 - n2/N represent finite population correction factors; the first quantity is such that SE(p1) is equal to 0 if n1 = N and the second quantity is such that SE(p2) is equal to 0 if n2 = N. (SE equal to 0 means no uncertainty in estimating the population proportion of interest; so these population correction factors force SEs to be zero in cases where one of the first 2 of the 4 categories includes the entire population.)

It strikes me that it would actually be easier to have the correction factor depend only on n/N, like so:

SE(p1) = sqrt( p1 *(1-p1)/n1 * (1 - n/N) )

SE(p2) = sqrt( p2 *(1-p2)/n2 * (1 - n/N) )where recall that n = n1 + n2 + n3 + n4.

**Question: **

Because p1, p2, p3 and p4 are assumed to follow a multinomial distribution, we need to compute not just SE(p1) and SE(p2) but also Cov(p1, p2), where Cov stands for covariance. **What is the correct way to correct this covariance to account for the fact that we are sampling from a finite population? ****Partial Answer: **

The paragraph below gives the formula for the estimated Cov(p1, p2) **without** a finite population correction:

Cov(p1, p2) = -p1*p2/n

but mentions that it is possible to apply a finite population correction to this formula. On the surface of it, I suspect that one could apply a finite population correction like this:

Cov(p1, p2) = -p1*p2/n * (1 - n1/N) * (1 - n2/N)

This type of correction would force the covariance to be 0 when either p1 or p2 would be equal to 1. Would that make sense?

But this correction is just based on a wild guess. ** Can anyone familiar with multinomial proportions confirm what the correct way to apply the finite population correction to Cov(p1,p2) is? **

Here is the paragraph in question, which comes from page 11 of https://www.canr.msu.edu/qfc/publications/pdf-techreports/2013-techreports/T2013-01.pdf:

**Edit: **

In thinking about this some more, it strikes me that it would actually be simpler to have the correction factor depend only on n/N, like so: SE(p1) = sqrt( p1 *(1-p1)/n1 * (1 - n/N) )

SE(p2) = sqrt( p2 *(1-p2)/n2 * (1 - n/N) )where recall that n = n1 + n2 + n3 + n4. Then perhaps we can just have the covariance corrected like this:

Cov(p1, p2) = -p1*p2/n * (1 - n/N) ?Thank you very much for any tidbits of insight you can throw my way. It's been a long time since I had to look at covariance formulas. (:

Isabella

------------------------------

Isabella R. Ghement, Ph.D.

Ghement Statistical Consulting Company Ltd.

301-7031 Blundell Road

Richmond, B.C., V6Y 1J5

Canada

Tel: 604-767-1250

Email:

isabella@ghement.ca------------------------------