Zachary,
I suggest you write out what you are trying to do in words. I think you are more likely to get a response to that than the pretty dense R code you've provided. [Not everyone will understand the data frame you are using (or necessarily have access to it) nor will everyone (many?) people understand the R code so getting a sense of your intent is important.]
For example (and I have no idea if this is even in the ballpark of what you are trying to do):
I have a list of records that comprise a sampling frame and I would like to examine the sampling distribution of a particular estimate (state the estimate here) for a sampling design that involves two stages of cluster sampling. In my list, the first stage primary sampling unit (PSU) is "Variable X" while the second stage sampling unit (Secondary Sampling Unit or SSU) is "Variable Y."
I plan to conduct a simulation that selects K PSUs and J SSUs and generates my estimate of interest for each of the simulated samples. I will use the entire set of simulated estimates to estimate the sampling distribution of the estimate.
Then describe what you want to compare. For example, do you want to compare the simulated variance of your estimate to some closed-form estimate of the variance that one can derive from a single sample?
Best of luck,
Dave
-------------------------------------------
David Wilson
Senior Research Statistician
RTI, International
-------------------------------------------
Original Message:
Sent: 10-31-2014 09:16
From: Zachary Scott
Subject: Bootstrapping makeshift 2 stage cluster sampling code in R?
This message has been cross posted to the following eGroups: Young Professionals Group and ASA Connect .
-------------------------------------------
I have written code for a complex 2 staged sample in R, I want to compare mean estimates of the results from this code to other methods however, it is not a function or a dataset, simply code that creates the results. I have pasted the 2nd stage part my code which is being produced for multiple sample sizes, I want to complete the simulation w replication. I have attached code. Thanks for your help.
library(devtools) source_gist(6424112) en_min<-15 en_max<-100 SSS <- matrix(c(rep(0,2*(en_max-en_min+1))),2,(en_max-en_min+1)) ps_min=2 psmax=13 for (ps in ps_min:psmax) { nabida<-merge(hoto,catte,by="Invoice..") ratatabin<-as.data.frame(table(pframe$Invoice..)) shestamar<-as.data.frame(table(nabida$Invoice..)) colnames(shestamar)[1]<-c("Invoice..") rotodoto<-merge(nabida,shestamar,by="Invoice..") beg2ndb<-stratified(rotodoto[order(rotodoto$Invoice..),], group = "Invoice..", size=ps) lutut<-as.data.frame(table(beg2ndb$Invoice..)) colnames(lutut)[1]<-c("Invoice..") colnames(shestamar)[1]<-c("Invoice..") wytiq<-merge(beg2ndb,lutut,by="Invoice..") wytiq$secondselection<-wytiq$Freq.y/wytiq$Freq.x wytiq wytiq$finalweight<-(1/wytiq$secondselection)*wytiq$samwt wytiq en<-nrow(wytiq) ssuest<-sum(wytiq$finalweight*wytiq$COMMISSION..Profit.Ratio.) Tcritssu=qt(0.95, en-1) twatiri<-nrow(pframe) lestamasia<-((twatiri-en)/(twatiri-37))**(((twatiri-37)/((twatiri-en)*((en/37)**2)))) ssuest-Tcritssu*sqrt(lestamasia*variance_ssupr) hetadan<-ssuest-Tcritpsu*sqrt(lestamasia*variance_psupr) SSS[1,en-(en_min-1)]=en SSS[2,en-(en_min-1)]=ssuest-Tcritpsu*sqrt(lestamasia*variance_ssupr)}