Thank you all; Michael, Wayne, and Qing
I was just thinking if someone would know the common sampling design used for estimating the prevalence of infection in animals. As I understand, the disease is contagious across the mentioned species. The researcher main interest is in
1) The prevalence of infection in livestock in general for the whole region.
2) The prevalence of infection in livestock in general within farm.
3) The prevalence of infection in livestock in general for each sub-region.
4) The prevalence of infection in each species for the whole region and each sub-region.
Stratifying by species and region will give a very large sample. Also, how it is can make sense to stratify by species but the species are mixed in most farms and the disease is contagious cross species.
From what I read in the FAO guidelines that the common practice is to stratify by region, farm size, or species. They consider the farms/herds as the primary sampling unit and the animals within farms as the secondary sampling unit. I am not clear on how to select animals within the farm and how to decide on the sample size from each selected farm considering the existing structure of farms.
Note: the cost is not considered yet, but the researcher would benefit from subsample to estimate the prevalence of another disease.
------Original Message------
Mohammad,
There are several questions to address before designing your sampling plan.
1. Are you interested solely in estimating the prevalence, or building a model that predicts prevalence based on certain covariates? This will determine the complexity of your design and analyses.
2. Is the disease contagious? If so, the statistical approach needs to take clustering into account.
3. Is there a known factor that might affect the prevalence? If so, stratifying your survey on it will improve your efficiency.
4. Will the sampling cost for each animal be the same? This will affect the sample size estimation.
There are three basic types of statistical sampling methods: simple random sampling, stratified sampling and cluster sampling. I recommend Sharon Lohr's book for a start. Some epi. publications on estimating disease prevalence would be helpful as well.
-------------------------------------------
Qing Kang
Chief Scientist
Statistical Intelligence Group, LLC
-------------------------------------------
Original Message:
Sent: 01-17-2015 17:26
From: Wayne Cornelius
Subject: Sample design and size
It's unclear how many farms there are (within each region and within each of the (7*nr.of.regions) implied types of farms; but I agree generally with classifying them this way). Also, the budget for collecting samples sets a limit on how many samples can be obtained. (That is, I am assuming a substantial expense is involved for collecting each sample.)
Taking "a systematic sample" consisting of some number of farms can be problematic if there is some structure within the lists of farms, because there may be unrecognized autocorrelation between the consecutive selected farms rather than pure independence. Insofar as randomization is concerned, a systematic sample is a sample of size 1--no matter how many farms it encompasses. But if convenience dictates systematic rather than random sampling, then I would recommend partitioning the strata into multiple interpenetrating systematic samples with individual random starting points. (Using 3 to 5 such interpenetrating samples may be satisfactory. More might be prudent if the budget allows.) Then at least you have that many independent estimates of any statistical parameter within each (sub-)stratum and can assess within-stratum variability among the farms.
-------------------------------------------
Wayne Cornelius
-------------------------------------------
Original Message:
Sent: 01-17-2015 15:26
From: Michael Kruger
Subject: Sample design and size
Agricultural sampling is not my area, but here are some thoughts.
You have three regions
You have seven types of farms, depending on how many of the three types of animals they have.
I would take region as your primary strata, and then within region I would order the farms by type of animal (001, 010, 100, 011, 101, 110, 111) and then take a systematic sample This ensures that you will get a sample distributed across types of farms (because if a disease can be spread across species, that may occur at a different rate on farms with more than one species). It seems likely to me that the type of farm would vary by region, which is why I'd suggest making region the primary stratification.
In terms of sample sizes, you might want to look at the formulas for this in Gerald van Belle's book, Statistical Rules of Thumb, 2nd edition. The nice thing about this book is that it has a variety of formulas depending on what information you can get from the client on the type of effect they'd like to measure. I've taken many of the formulas from this book (and some other sources) and put them into a spreadsheet you can find here: https://sites.google.com/site/krugersite3/assignments/statisticalrulesofthumb
-------------------------------------------
Michael Kruger
Information Resources Inc
-------------------------------------------