Hi Fred:
In a model, blocking on variable may be a better answer. For my specific case, we are dividing the samples into groups for processing on a mass spectrometer, and want to be sure not to over-represent a group in a particular run. The mass spec is known for introducing noise, and we don't to misinterpret results as indicative of a particular group, when in fact they are more indicative of a particular run.
I did ask the researcher for less specific grouping, but I do wonder if it would be better to divvy up each little group independently (so that each group is represented in each run of the instrument), or whether randomizing on the larger groups is sufficient.
Thank you,
Barbara
Would it be better to block the independent variables (disease state, age, sex, source), rather than randomize them?
That way, you can ensure they are balanced across the batches.
------------------------------
Fred Girshick
------------------------------
------------------------------
Barbara Graham
Biostatistician
Colorado State University
------------------------------
Original Message:
Sent: 06-02-2020 11:52
From: Fred Girshick
Subject: Random assignment to groups
Barbara:
Would it be better to block the independent variables (disease state, age, sex, source), rather than randomize them?
That way, you can ensure they are balanced across the batches.
------------------------------
Fred Girshick
Original Message:
Sent: 05-29-2020 12:40
From: Barbara Graham
Subject: Random assignment to groups
Hi all:
I am working with a set of samples that I would like to randomize into three different groups for processing. (These are biological samples going through a mass spectrometer in three batches.) For each sample, we know a disease state, age, sex, and source (clinic). All of these variables are expected to have some influence on the measurements, and we want to be sure there is a balance in the batches so that the variation due to the instrument (batch) is not overly represented in any of the variables. Is it important to balance the three batches on all of these variables (i.e. to partition the data set using the four variables)? What about for eventual randomization into a training set and a validation set? Can or should we include all the variables when randomizing into the sets?
Thanks for the group wisdom and experience.
------------------------------
Barbara Graham
Biostatistician
Colorado State University
------------------------------