For a detailed and up-to-date discussion of a practical approach to power analyses in the context of general and generalized linear mixed models, consider Chapter 16 in the recent 2013 book by Walt Stroup (UNL) "Generalized Linear Mixed Models: Modern Concepts, Methods and Applications" or Chp 7 in the recent 2012 book by the NCCC170 group, edited by Ed Gbur (UArkansas): "Analysis of Generalized Linear Mixed Models in the Agricultural and Natural Resources Sciences".

Original Message:

Sent: 05-11-2015 14:41

From: Paul Thompson

Subject: Power calculation for a mixed effects model

All good questions. I generally ask that they speculate on what will come out. Certainly if you run an experiment, you have some idea (hopefully vague) about what the results are going to be.

As to "small" "medium" and "large", I really don't like those. They allow the PI to be very casual about predictions. If you ask them to come up with values, maybe they think a bit more about it?

1) Ask the PI to speculate on the results

2) For RM analysis, ask how likely it will be that cases will change rank-order position. That can help in the notion of the correlation.

3) In the actual power analysis, I prefer to have a range of a) diffs between C and E groups b) stds c) correlations. I don't think a single "power estimate" is appropriate. There should be a number, and again this may lead to a dialog.

------------------------------

Paul Thompson

Director, Methodology and Data Analysis Center

Sanford Research/USD

------------------------------

Original Message:

Sent: 05-11-2015 13:32

From: Kim Love-Myers

Subject: Power calculation for a mixed effects model

All--this doesn't help answer Isabella's question, but I am curious about how others approach power analyses of this sort.

Often clients come to me with no concept of the sizes of the effects in which they are interested in. Even for something as simple as a t-test, it's very difficult to understand what is a reasonable effect size that will either be seen in the study or would be of interest to the researcher. The majority of my clients cannot or do not want to invest in a pilot study. Sometimes there are previous studies or reports from instrument development that allow me to learn something about the standard deviation we might be able to expect in the measurements--but often all I can do from there is give some table to the client showing either how any subjects they would need for various differences or what power they might have given a particular sample size (with appropriate explanation, of course).

If there is nothing available at all, which is often the case, I sometimes turn to generic effect sizes (for example Cohen's D) described as "small", "medium", or "large" in various literature. I will only do this for my clients who NEED a sample size estimate and CAN'T do a pilot study (often academic clients at online universities where I feel the statistical support/understanding/availability of advisors is low, and the clients' thesis/dissertation guidelines are often generic and very highly structured). Of course, I let my clients know this is basically ridiculous, and any conclusions we come to are completely unreliable when we have no understanding of what we are measuring.

A tiny bit of justification here; my other option, based on past experience, is for my clients to turn to online power calculators with no understanding of what they are doing and often using settings for tests that are completely incorrect *sigh*. To me, this is worse; I would rather have an informed client making decisions for herself than an uninformed one making decisions with no knowledge of what she is claiming she can do.

Anyway--what I've mentioned above is even just in the case of a simple t-test. I can't imagine trying to come up with a power or sample size calculation for something as complicated a three-level mixed effects model. That would involve so many more parameters and variances that need to be estimated in order to give any kind of reasonable answer... One time in my years of experience have I been able to give what I thought was truly good sample size advice on something as complicated as an unbalanced ANOVA with subsampling, due to a client's previous work. Does anyone have a good experience with this ("good" being defined however you like)?

Kim

------------------------------

Kim Love-Myers

Associate Director, UGA Statistical Consulting Center

------------------------------

Original Message:

Sent: 05-09-2015 18:45

From: Edward Jones

Subject: Power calculation for a mixed effects model

All power calculations start with a model and experimental design. In this case you have a restricted number of subjects, which limits your choice of models for a power analysis.

I would be concerned about carryover effects. If someone has the shift sequence D->E->N->O, their response might be different from some one with the sequence O->N->E->D.

I would use a crossover design with covariates. This can be described as a collection of 6 latin squares where each square is a 4x4 arrangement balanced for sequence (1, 2, .. 4) and subject (1, .. 4).

With this arrangement with power analysis can be conducted on the F-test for treatments (shift) where the error degrees of freedom is over 30. That power should be respectable. I would also include a power analysis for comparing the four shifts, which could be done using a t-test. That too should have respectable power.

The main weakness results from a lack of subjects (30 or so) and a large number of possible shift sequences (24). The crossover design assumes no interactions between subjects and sequence, but resolving that requires a larger study.

Best - edwardJones

------------------------------

Edward Jones

Executive Professor

Texas A&M Statistical Services

------------------------------

Original Message:

Sent: 05-09-2015 14:23

From: Isabella Ghement

Subject: Power calculation for a mixed effects model

Hi everyone,

I am working on a power calculation for a mixed effects model and am not sure whether I have formulated the model correctly. I will share what I have in the hope that someone on this list could confirm whether I am going about this in an appropriate way.

The power calculation concerns a study where we know the following:

1. A small number of subjects will be recruited in the study (perhaps 20 or 30) - these will actually be volunteers;

2. All workers are shift workers, who have 4 types of shifts: day, evening, night and off.

3. A sequence of 35 calendar days will be selected for the study (which will cover a combination of the 4 shifts), during which the workers will be administered a cognition test, whose outcomes will be a) completion time and b) total test score.

4. During each shift, the workers will be asked to take the cognition test at the beginning and end of the shift (at least), though the exact times when the test is taken may differ among workers.

5. Other information collected on each worker includes: Age, Gender, Stress Level.

6. It is expected that cognitive function will decline as the shift progresses. Interest lies in testing differences between shifts as well as differences between genders with respect to the rate of decline in cognitive function.

In considering the above, it seems to me that this is a 3-level mixed effects model (though I am not sure, as I don't work with mixed effects models all that often). Is what I am proposing below reasonable?

Level 3 Subject 1 Subject n

Level 2 Day 1 Day 2 .... Day 35 Day 1 Day 2 ... Day 35

Level 1 T1 T2 T1 T2 ... T1 T2 T1 T2 T1 T2 ... T1 T2

Test occasions (denoted by T1 and T2) would represent the first level of nesting, followed by calendar days (the second level of nesting), followed by subject (the third level of nesting). T1 and T2 could perhaps be represented as hour of day (?) or "Beginning" and "End" (?).

Calendar Day would be treated as a random factor and "Shift Type" (factor with 4 levels) would be treated as a level-2 predictor (?). But does it make sense to treat calendar day as a random factors when the days are chosen to that a particular sequence of day, afternoon, evening and off shifts are captured? On the other hand, with only 20-30 subjects, maybe this is not unreasonable.

Subject would be treated as a random factor (?). If the subjects are volunteers, how reasonable is this? Age, Gender and Stress Level would be treated as level-3 predictors (?).

Subject and Calendar Day would be treated as crossed factors (?), as the same set of Calendar Days is used for each subject.

For power calculations, would it be reasonable to just consider the simplest model possible - say, one which includes a random effect for subject and no interactions?

Thank you in advance for any insights you may be able to share.

Isabella

------------------------------

Isabella Ghement

Ghement Statistical Consulting Company Ltd.

------------------------------