Discussion: View Thread

  • 1.  Correct subjects in SAS analysis of repeated measures

    Posted 03-04-2013 21:42
    This message has been cross posted to the following eGroups: Biometrics Section and Statistical Consulting Section .
    -------------------------------------------

    Last summer I had a problem that this group answered very well - thank you all (George Milliken's response was especially useful).  Now I have run into another problem - trying to get SAS to use the correct subject in repeated measures analyses.  Recently I have had several Randomized Complete Block Designs that have been repeated over a series of years to analyze.  The only way I can see to construct the SAS statements for these analyses results in the Blocks being used as subject rather than the Plots within Blocks.  As an example, consider yields of alfalfa from plots fertilized with P and K compared to untreated check plots in an Randomized Complete Block Design with 4 blocks over 4 years.  The outline for the repeated measure ANOVA of these 32 observations is


    Source                  df

    Blocks, B              3

    Trt, T                      1

    BxT (Error A)         3

    Years, Y                3

    TxY                         3

    Residual               18


    and the SAS statements I used for the analysis are


    Proc Mixed;

                    Class block trt year;

                    Model yield = trt year trt*year / ddfm=kr;

                    Random int  trt  / subject=block;

                    Repeated year / subject=block*trt  type=arh(1) rcorr;


    SAS makes blocks the subject, but block*trt (which identifies the plots within blocks) should be the subject, at least in my mind.  The resulting parameters for the 4x4 ARH(1) covariance matrix make sense based on an examination of the residuals from the variance component only analysis.  Do I have to settle for this being "close enough" or is there a way to make SAS use Plot(Block) or Block*Trt as the subject?  This problem must not only be frequent in agriculture where RCB Designs are used in a high percentage of the field trials, but also in other areas such as education (students within schools as subjects) and medicine (patients within clinics) so hopefully some of you know how to handle it.


    -------------------------------------------
    Jon Baldock
    -------------------------------------------


  • 2.  RE:Correct subjects in SAS analysis of repeated measures

    Posted 03-05-2013 08:54
    Hi Jon,

    I'm wondering, what is the reason you have included "trt" in the random statement? I imagine there could be a reason for doing this, but in my experience that would be very unusual. Typically, I have only seen fixed effects included in the random statement like this when the effect is something continuous, and the slopes are allowed to vary randomly across the subjects. Since your treatment is a class variable, this may be overly complicating the model for what you want to do.

    I might suggest the following version of the code:

    Proc Mixed;

                    Class block trt year;

                    Model yield = trt year trt*year / ddfm=kr;

                    Random block;

                    Repeated year / subject=block*trt  type=arh(1) rcorr;


    I don't know if that will change how SAS counts the number of subjects, but it may be worth trying. (Note, I believe "Random block;" will produce the same result as "Random int/subject=block;".)

    - Kim

    -------------------------------------------
    Kim Love-Myers
    Associate Director, UGA SCC
    -------------------------------------------








  • 3.  RE:Correct subjects in SAS analysis of repeated measures

    Posted 03-05-2013 13:52
    Hi Jon,

    You are describing a problem that bothered me for many years until I researched them.  It turns out that there is a much deeper problem here than what SAS can or cannot do.    

    Here's the issue: The standard analysis that you describe assumes that the repeated measurements are taken within subjects and that subjects are independent of one another.  That is, while there may be within-subject correlation between measurements, there is no between-subject correlation.  In a medical trial, the patients are not all seen at the same time, so that the same relative time within different patients (Week 1, Week 2, etc.) corresponds to different calendar time.  Thus, their repeated measurements are surely not influenced by others' measurements nor by some common factor affecting all of their measurements at a given time.  

    However, when you take repeated measurements across years in a field trial, this assumption is almost certainly violated.  The common annual environments under which the repeated measurements are taken impart a common random effect on all measurements taken in the same year.  If it's a wet year, a dry year, a hot year, or whatever, ALL measurements taken in that year are similarly affected.  Therefore, the differences you actually see from year to year are a completely confounded combination of the fixed effects you want to find and the random effects that are a nuisance.  

    What's worse is the possibility that different treatments perform differently under the different environmental conditions. If some of the treatments do better than others in wet years but worse than others in hot years, then you have a random Year*TRT interaction, which again is completely confounded with the fixed effects you want to identify.   There is literally no way to separate the fixed and random effects without adding further assumptions to the analysis, such as a model for the fixed-effect pattern.

    Please see the two papers:

    Loughin, T.M. (2006). Improved Experimental Design and Analysis for Long-Term Experiments. Crop Science, 46, 2492-2506.

    Figure 1 from this paper depicts the problem clearly (to my way of thinking, at least!).  The random Year effect is a stripped factor running orthogonal to all of the other experimental units in the design, which interferes with the fixed-effect analysis.  The paper gives suggestions for the analysis of these experiments by including extra assumptions.  There is also SAS code.  More importantly, the paper describes how to properly design such experiments so that the confounding of random and fixed Year effects is no longer an issue: The Staggered-Start design: begin different blocks in different years, so that the relative time and the calendar time are not confounded.

    Loughin, T.M.; Roediger, M.P.; Milliken, G.A.; and Schmidt, J.P. (2007). On the analysis of long-term experiments. JRSS-A, 170, 24-42.

    This paper gives the technical details on the problem and some simulations under very realistic conditions (defined as levels of variance components that have been measured in previously published experiments), showing just how wrong the answers can be if one ignores the issue and "blasts away" with the standard within-subjects repeated measures analysis.

    I hope this helps.  Good luck.

    -Tom.

    -------------------------------------------
    Thomas Loughin
    Simon Fraser University
    www.stat.sfu.ca/~tloughin/STATPAGE.html
    -------------------------------------------








  • 4.  RE:Correct subjects in SAS analysis of repeated measures

    Posted 03-09-2013 17:48
    Dear Jon, Just an old idea before Proc Mixed came along - analyze reduced measures over time on your basic randomized block design. I always find there are 6 reduced measures clients find interesting when there are 4 time points: each time point separately, the mean over all 4 time points, and the slope (or linear contrast) over all 4 time points. Calculate each of these for each plot and then do the RCD analysis on each one. The mean tends to correspond to the Trt Effect in the repeated measures analysis, and the linear contrast over time tends to correspond to the most interesting piece of the Trt x Year part of the repeated measures analysis. If you have someone interested in whether values went up and then down over time, then calculate a quadratic contrast on each plot with the 4 time values, but most clients aren't looking for this. Sometimes you may want to choose a primary endpoint from the measures and only consider others if the primary is significant. Typically, the mean over time or the last time point are chosen for primary. This approach is often much easier for clients to understand, as well as old statisticians! ------------------------------------------- Bill Stewart Distinguished Biostatistician EMB Statistical Solutions, LLC. -------------------------------------------