ASA Connect

 View Only
  • 1.  Bootstrapping makeshift 2 stage cluster sampling code in R?

    Posted 11-03-2014 08:04
    This message has been cross posted to the following eGroups: Young Professionals Group and ASA Connect .
    -------------------------------------------


    I have written code for a complex 2 staged sample in R, I want to compare mean estimates of the results from this code to other methods however, it is not a function or a dataset, simply code that creates the results. I have pasted the 2nd stage part my code which is being produced for multiple sample sizes, I want to complete the simulation w replication. I have attached code. Thanks for your help.

    library(devtools)   source_gist(6424112) en_min<-15 en_max<-100 SSS   <- matrix(c(rep(0,2*(en_max-en_min+1))),2,(en_max-en_min+1)) ps_min=2 psmax=13  for (ps in ps_min:psmax) { nabida<-merge(hoto,catte,by="Invoice..")  ratatabin<-as.data.frame(table(pframe$Invoice..))  shestamar<-as.data.frame(table(nabida$Invoice..)) colnames(shestamar)[1]<-c("Invoice..")  rotodoto<-merge(nabida,shestamar,by="Invoice..") beg2ndb<-stratified(rotodoto[order(rotodoto$Invoice..),], group = "Invoice..", size=ps) lutut<-as.data.frame(table(beg2ndb$Invoice..)) colnames(lutut)[1]<-c("Invoice..") colnames(shestamar)[1]<-c("Invoice..")   wytiq<-merge(beg2ndb,lutut,by="Invoice..")  wytiq$secondselection<-wytiq$Freq.y/wytiq$Freq.x  wytiq  wytiq$finalweight<-(1/wytiq$secondselection)*wytiq$samwt  wytiq  en<-nrow(wytiq)  ssuest<-sum(wytiq$finalweight*wytiq$COMMISSION..Profit.Ratio.)  Tcritssu=qt(0.95, en-1)  twatiri<-nrow(pframe) lestamasia<-((twatiri-en)/(twatiri-37))**(((twatiri-37)/((twatiri-en)*((en/37)**2)))) ssuest-Tcritssu*sqrt(lestamasia*variance_ssupr) hetadan<-ssuest-Tcritpsu*sqrt(lestamasia*variance_psupr) SSS[1,en-(en_min-1)]=en SSS[2,en-(en_min-1)]=ssuest-Tcritpsu*sqrt(lestamasia*variance_ssupr)} 



  • 2.  RE: Bootstrapping makeshift 2 stage cluster sampling code in R?

    Posted 11-05-2014 08:27
    Zachary,

    you might get somewhat better comments if you posted this on SRMSNET or to the SRMS section group.

    Your code looks ugly -- see if you can edit it so that the ends of the lines are clear.

    Are you relying on a methodological reference?

    -------------------------------------------
    Stas Kolenikov
    Principal Survey Scientist
    Abt SRBI
    -------------------------------------------




  • 3.  RE: Bootstrapping makeshift 2 stage cluster sampling code in R?

    Posted 11-06-2014 08:50
    Zachary,

     I suggest you write out what you are trying to do in words.  I think you are more likely to get a response to that than the pretty dense R code you've provided. [Not everyone will understand the data frame you are using (or necessarily have access to it) nor will everyone (many?) people understand the R code so getting a sense of your intent is important.]

    For example (and I have no idea if this is even in the ballpark of what you are trying to do):
    I have a list of records that comprise a sampling frame and I would like to examine the sampling distribution of a particular estimate (state the estimate here) for a sampling design that involves two stages of cluster sampling.  In my list, the first stage primary sampling unit (PSU) is "Variable X" while the second stage sampling unit (Secondary Sampling Unit or SSU) is "Variable Y."

    I plan to conduct a simulation that selects K PSUs and J SSUs and generates my estimate of interest for each of the simulated samples. I will use the entire set of simulated estimates to estimate the sampling distribution of the estimate.

    Then describe what you want to compare. For example, do you want to compare the simulated variance of your estimate to some closed-form estimate of the variance that one can derive from a single sample?

    Best of luck,
    Dave

    -------------------------------------------
    David Wilson
    Senior Research Statistician
    RTI, International
    -------------------------------------------