ASA Connect

View Only

Back to eGroups

Expand all | Collapse all

Advice on proportion confidence interval for a statified random sample

1. Advice on proportion confidence interval for a statified random sample

0 Recommend
Iris Cheng
Posted 09-09-2015 09:28
Hi, I have a big dataset (18000 records) that I would like to look for the proportion probability of found or not found of a certain condition. I stratified the population into 5 groups based on the hospital coverage performance since the numbers of records in each hospitals are vary (some have thousands, some are on the 10s). I performed 2% SRS on each strata for the amount of labor work I can expense. However, after I completed the checking of each sample, on one strata I did not find any records match the condition which give me p_i = 0. Also, other stratum gave low proportion (I think maybe the condition is rare).

My questions are

1) Can I still calculate the confidence intervals when some stratum gave me proportion = 0;

2) What are the limitation of using stratified sampling? I recalled from somewhere tells me that if the proportion of the strata is too low (less than 0.1) then it may not have a good estimation.

I appreciate all the explanation and reference.

Thanks!

------------------------------
Iris Cheng

Graduate Student at Baruch College
------------------------------
2. RE: Advice on proportion confidence interval for a statified random sample

0 Recommend
Joseph Nolan
Posted 09-10-2015 06:47
Hello,

My main suggestion would be to utilize all of your data. In the age of computers, why is it necessary to reduce 18000 observations to 360? When you've then stratified beyond that, it leaves you with samples of size roughly 72 (assuming equal sample sizes which they may or may not be). In particular since in appears your event is rare, that sample size would not lead to confidence intervals that do a good job (i.e. have reasonable margins of error) of estimating the probability of your event. There are of course some methods out there that would adjust for the rare event issue (sorry, I don't know the references off the top of my head). But I'd start with using everything I had rather than just 2% of it.

Cheers,
Joe

------------------------------
Joseph Nolan
Associate Professor of Statistics
Director, Burkardt Consulting Center
NKU Department of Mathematics & Statistics
------------------------------

Original Message
3. RE: Advice on proportion confidence interval for a statified random sample

0 Recommend
Iris Cheng
Posted 09-11-2015 12:48
Thanks Joe,

I also would like to utilize all of the data. The reason to do sampling because it takes a lot of labor work to check too many samples with another dataset which does not have a common id. It is the goal to find a matching record and report it as a finding to evaluate the existing matching algorithm, and therefore, it is a rare event.

Do you have other thoughts?

Thanks,

------------------------------
Iris Cheng

Graduate Student at Baruch College
------------------------------

Original Message
4. RE: Advice on proportion confidence interval for a statified random sample

1 Recommend
Margot Tollefson
Posted 10-27-2015 09:27
| view attached
I just had a paper published by International Researchers on CI's for the hypergeometric distribution. Finding the CI for the number with the condition in each strata and adding the upper limits and the lower limits should give a CI for the population, then dividing by the size of the population should give the CI for the proportion. The International Researchers website has been down for the last few days. I have attached the R function that calculates the CI's.

Best,

Margot Tollefson
------------------------------
Margot Tollefson
Consultant
Vanward Statistics

Attachment(s)

hyper.CI.R.txt 6 KB 1 version

Original Message
5. RE: Advice on proportion confidence interval for a statified random sample

0 Recommend
Phillip Kott
Posted 10-28-2015 09:48
You have two problems. One is that you have a clustered sample, which you need to take into account in estimating both the population proportion and the standard error of that estimate. Two is that the standard Wald coverage interval will likely not be very good. For two sided coverages, it can be replaced by a modified Wilson interval. See, for, example, https://fcsm.sites.usa.gov/files/2014/05/2001FCSM_Kott.pdf. Good one-sided intervals are a more difficult to construct

.
------------------------------
Phillip Kott
RTI International

Original Message

ASA Connect

Advice on proportion confidence interval for a statified random sample

Iris Cheng09-09-2015 09:28

Joseph Nolan09-10-2015 06:47

Iris Cheng09-11-2015 12:48

Margot Tollefson10-27-2015 09:27

Phillip Kott10-28-2015 09:48

1. Advice on proportion confidence interval for a statified random sample

2. RE: Advice on proportion confidence interval for a statified random sample

3. RE: Advice on proportion confidence interval for a statified random sample

4. RE: Advice on proportion confidence interval for a statified random sample

5. RE: Advice on proportion confidence interval for a statified random sample

Contact Us

Membership

Privacy

Follow Us