ASA Connect

 View Only
  • 1.  Random assignment to groups

    Posted 05-29-2020 12:41
    Hi all:

    I am working with a set of samples that I would like to randomize into three different groups for processing. (These are biological samples going through a mass spectrometer in three batches.) For each sample, we know a disease state, age, sex, and source  (clinic). All of these variables are expected to have some influence on the measurements, and we want to be sure there is a balance in the batches so that the variation due to the instrument (batch) is not overly represented in any of the variables. Is it important to balance the three batches on all of these variables (i.e. to partition the data set using the four variables)? What about for eventual randomization into a training set and a validation set? Can or should we include all the variables when randomizing into the sets?

    Thanks for the group wisdom and experience.

    ------------------------------
    Barbara Graham
    Biostatistician
    Colorado State University
    ------------------------------


  • 2.  RE: Random assignment to groups

    Posted 06-01-2020 09:36
    Hello Barbara,

    You did not mention the total number of samples but you do not want to stratify on too many variables relative to your sample size such that you end up with sparse strata. It is generally advisable to keep the number of variables used to stratify in stratified randomization to a minimum and adjust for the other prognostic variables in a multivariable model, possibly as random effects as appropriate. Another option would be minimization.

    Robert

    ------------------------------
    Robert O'Brien
    ------------------------------



  • 3.  RE: Random assignment to groups

    Posted 06-02-2020 10:58
    Thank you Robert,

    The problem of sparse strata is one that I was particularly worried about. While we have hundreds of samples, the researchers also had dozens of potential disease states. I asked them to use more generic groups of diseases more relative to the main question of interest. I appreciate your input

    Barbara

    ------------------------------
    Barbara Graham
    Biostatistician
    Colorado State University
    ------------------------------



  • 4.  RE: Random assignment to groups

    Posted 06-01-2020 10:14
    Hi Barbara,

    If you decide to use stratification variables for randomization, I have found SAS Proc SurveySelect to be very helpful.  In this example, there are two sites with cohorts of 6, 4 treatment and 2 controls in each cohort.  This example can easily be modified for more stratification variables.  The outorder=random option will randomly order the group assignment in each cohort.

    /* 2 sites with 4 treatment, 2 control in each cohort (or block) */
    Data BaseList;
    Input SiteID TrtGrp @@;
    Datalines;
    1 0 1 0 1 1 1 1 1 1 1 1
    2 0 2 0 2 1 2 1 2 1 2 1
    ;
    Run;

    /* TrtGroup 1=intervention, 0=control */
    /* Select dataset in different orders */
    proc surveyselect data=BaseList method=srs n=6 outorder=random
    reps=6 seed=52019 out=MICHR;
    strata SiteID;
    run;

    ------------------------------
    Brandy Sinco, BS, MA, MS
    Statistician Senior
    Michigan Medicine
    ------------------------------



  • 5.  RE: Random assignment to groups

    Posted 06-02-2020 11:53
    Barbara:

    Would it be better to block the independent variables (disease state, age, sex, source), rather than randomize them?
    That way, you can ensure they are balanced across the batches.


    ------------------------------
    Fred Girshick
    ------------------------------



  • 6.  RE: Random assignment to groups

    Posted 06-02-2020 12:30
    Hi Fred:

    In a model, blocking on variable may be a better answer. For my specific case, we are dividing the samples into groups for processing on a mass spectrometer, and want to be sure not to over-represent a group in a particular run. The mass spec is known for introducing noise, and we don't to misinterpret results as indicative of a particular group, when in fact they are more indicative of a particular run.

    I did ask the researcher for less specific grouping, but I do wonder if it would be better to divvy up each little group independently (so that each group is represented in each run of the instrument), or whether randomizing on the larger groups is sufficient.

    Thank you,
    Barbara
    Would it be better to block the independent variables (disease state, age, sex, source), rather than randomize them?
    That way, you can ensure they are balanced across the batches.


    ------------------------------
    Fred Girshick
    ------------------------------
    Fred Girshick,  06-02-2020 11:52


    ------------------------------
    Barbara Graham
    Biostatistician
    Colorado State University
    ------------------------------