Discussion: View Thread

  • 1.  Complex Survey Weighting and Stratification

    Posted 06-13-2016 16:43

    I'm working with a client who decided to find a statistician after they had already created the project design and now I'm tasked not with telling them what the project died from, but trying to bring it back to life. Fischer is rolling over in his grave. I'm trying to get a confirmation that what I'm seeing is correct. Thank you for any help you can provide.

    The project is collecting education data in foreign country. The country has 8 states, and each state has many districts (somewhere between 6 and 30 in each state), and each district has several hundred or even thousand schools. They selected 2-5 districts in each province to be in Cohort 1, 2-5 districts for Cohort 2, and again 2-5 for Cohort 3. These are the districts the project is working in. 

    For their data collection, they decided to collect data from one district in each province and cohort.

    To me, this sounds like a fairly straightforward 2 stage design:

    Stage Sampling Unit Stratification Variables
    1 District Cohort and Province
    2 School No Stratification Variable

    However, with this design, as there is only one sampled unit per strata in the first stage, I get no variance estimates.

    I guess I'm trying to figure out if there is a way I can fiddle with this design in order to get variance estimates, and this things like F statistics on regressions or confidence intervals for my means estimates. I can't think of a way to do that, short of sending the teams back out into the field to collect in a second district in each cohort and province. Suggestions? Thank you for any help or guidance you can provide.

    ------------------------------
    Michael Costello
    Consultant
    ------------------------------


  • 2.  RE: Complex Survey Weighting and Stratification

    Posted 06-13-2016 21:13

    The most typical shortcut for variance estimation in such circumstances is to forego stratification. That way, you'd be subtracting the PSU means from the grand mean rather than from the strata means, so the variance estimates include the unnecessary between-strata components, and thus are conservative.

    Even then, you are still ridiculously short on degrees of freedom -- you only have 8 PSUs minus one grand mean = 7 d.f.s. Not only you get fatter tails for the estimates of the means/proportions, but also your regression models physically cannot withstand more than 7 regressors.

    A fairly unfortunate design, indeed.

    ------------------------------
    Stanislav Kolenikov
    Principal Survey Scientist
    Abt SRBI
    Education Officer, Survey Research Methods Section



  • 3.  RE: Complex Survey Weighting and Stratification

    Posted 06-14-2016 09:48

    It might depend on the client's goal for the project. If they want to know about variation in data across cohorts/provinces/districts, then part of your work as consultant is to explain that more data is needed and why, and then help them design and execute a better collection method.

    However, if they only want some preliminary analysis and do not need to know about differences across strata, then you can do as has been suggested and simply not group the data you already have.

    ------------------------------
    Tim Young
    Senior Analyst-Statistics & Analytics
    Southwestern Illinois College