Discussion: View Thread

intraclass correlation (ICC) testing?

  • 1.  intraclass correlation (ICC) testing?

    Posted 08-15-2012 18:50
    This message has been cross posted to the following eGroups: Statistical Consulting Section and Young Professionals Group .
    -------------------------------------------

    Hi all,

    Am wondering if it is possible to use ICCs in a hypothesis testing context. i.e. investigators wish to design a study where an ICC of 0.7 can be detected with 80% power, etc.

    What's the right way to think about this? Cheers!


    -------------------------------------------
    Jonathan Moscovici
    -------------------------------------------


  • 2.  RE:intraclass correlation (ICC) testing?

    Posted 08-15-2012 20:43
    The ICC can be viewed as estimating variance components, and you can use the variance on the estimated variance components to construct the appropriate test.  You could use this approach to then determine power/sample size.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 3.  RE:intraclass correlation (ICC) testing?

    Posted 08-15-2012 20:45
    To Robert's point, you can take a look at Shrout and Fleiss (1979), Intraclass correlations: uses in assessing rater reliability

    http://www.ncbi.nlm.nih.gov/pubmed/18839484 


    -------------------------------------------
    Ming Zhou
    -------------------------------------------








  • 4.  RE:intraclass correlation (ICC) testing?

    Posted 08-15-2012 23:51
    Inquiries like this beg me ask: What's the question driving the study and the primary analysis?

    For correlations, including ICCs, the critical question usually should be: What is the correlation? If so, then the common (frequentist) analysis calls for computing the estimate and constructing an appropriate confidence interval.

    The sample-size problem is: Given a true (population) ICC, and k measurements/members per group, find the number, N, of independent groups required to obtain a desired confidence interval of full-width W. For example, for a study on identical twins, k = 2, N = number of pairs of twins. Maybe ICC.true = 0.70 and you want a tight CI, so W = 0.20.

    This is dealt with in:

    Bonett, D. G. (2002). Sample size requirements for estimating intraclass correlations with desired precision. Statistics in Medicine, 21(9):1331-1335.

    which is implemented by the Nest() function in the ICC R package:

    > library(ICC)
    > (N <- Nest(est.type = "hypothetical", w=0.20, ICC = 0.70, k = 2,
    + x = NULL, y = NULL, data = NULL, alpha = 0.05))
          2
    0.7 101

    Most people would probably deal with this by conducting a sample-size analysis to assess the statistical power for rejecting H0: ICC = 0 in favor of Ha: ICC > 0. But if you think critically, there is usually little or no scientific value in knowing that the ICC just differs from 0.00. (In the Bayesian denomination, the ICC would likely have a continuous and smooth prior, so ICC = 0 has positive density but zero probability.)

    The same issue holds when the analysis involves a simple correlation: It is rarely important to know that X and Y have some non-zero correlation. We really should be asking: What is the X,Y correlation?

    The same issue holds for comparing two independent means or proportions or odds. For example, we could be comparing two odds through the odds ratio, OR, from a 2 x 2 contingency table. Ho: OR = 1 is usually what I call a "straw null hypothesis" (one made of straw, so it is easily knocked over). Basing your sample size on the power to reject this just seems rather meaningless to me. 

    An excellent commentary along these lines is:

    Connor, J. T. (2004). The value of a p-valueless paper. Am J Gastroenterol, 99(9):1638-40.

    In most studies, one can formulate a given research question into a single "focal" parameter estimate and confidence interval that provide far more information than the p-value from testing a straw null hypothesis. If so, a corresponding sample-size analysis for study planning should be based on obtaining a CI that has adequate tightness and/or adequate essential power to exclude the all values in some "null interval" for the parameter. 

    -------------------------------------------
    Ralph O'Brien
    Case Western Reserve University
    -------------------------------------------








  • 5.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 13:28
    Thanks for all the great replies! I looked into the papers suggested.
    The objective of the analysis is to measure the level of agreement among 4 raters using a 5 point scale system (ordinal in nature).

    The investigators are proposing 100 targets and are expecting an agreement of 0.8, but would like to justify the sample size to this end.

    Calculating the width of the C.I. under these conditions would be great.
    Case 2 from the Fleiss and Shrout paper seems to be the most appropriate, and is implemented in the psych package in R.

    In terms of C.I. calculation, I'm wondering if the Nest() method in the ICC library is appropriate. I could input k=4, ICC=0.8, and w=0.15 for example, which yields N=54.
    Thus 54 targets to be rated by each of the 4 raters should be fine.

    However, I'm confused by this. The Nest() method is based off the Bonnett paper, which deals with continuous or dichotomous outcomes (our 5-point scale is neither).

    The Fleiss and Shrout paper seems to be for discrete ordinal scales, but the C.I. formulae are data-dependent in that, for example, Case 2 requires use of some mean-square calculations from the ANOVA, making it hard to use beforehand.


    Would the Bonnett method be robust enough to deal with our situation? I wouldn't be sure how to justify that.

    Thanks very much! This is very interesting stuff.










    -------------------------------------------
    Jonathan Moscovici
    -------------------------------------------








  • 6.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 13:50
    Hi,

    The alternative for discrete (at least, rank based data) is Kendall's coefficient of concordance.  However, once the number of raters or targets get past 10 to 20, in my experience, the values are usually numerically indistinguishable.  So, I think you'd be safe not worrying about the 5-point scale. 

    If you really wanted to check it out for sure, particularly if you have any real data at hand, you could always do a simulation.

    Steve

    -------------------------------------------
    Stephan Arndt
    Professor
    University of Iowa, Iowa Consortium
    -------------------------------------------








  • 7.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 14:06
    Oh great! I was not aware of this coefficient. I was familiar with Fleiss's Kappa, but that doesn't work for ordered categorical outcomes.

    I am also confident that 100 targets is plenty, but then the question would be: why would we need so many? ..which would lead to "how many do we realistically need"?

    I could resort to simulating data in order to use the Fleiss and Shrout formulae, but would like to know the limitations of the Bonnett formula (e.g. whether it can be applied to this 5 point scale comfortably..)

    Thanks!!



    -------------------------------------------
    Jonathan Moscovici
    -------------------------------------------








  • 8.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 15:04
    This demonstrates why I find that lots of study planning is better done via Monte Carlo simulation instead of by formulas that are derived under tight distributional assumptions rarely met in real work. A well-designed simulation study--not hard to do in R--will let you evaluate the robustness of any method you select to use with a 5-point ordinal scale, and the same program will give you data on what the CI width is for a given N. 

    -------------------------------------------
    Ralph O'Brien
    Case Western Reserve University
    -------------------------------------------








  • 9.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 15:13
    Thanks for the reply!

    I'm familiar with computation intensive stats methods such as this, but find that investigators are often less receptive to anything that can be interpreted as "re-inventing the wheel". I'll see what they think.

    i.e. they would likely prefer pulling a formula from a paper than asking their statistical consultant to perform simulations (the concepts behind which they do not understand). Unless of course there is no recourse.

    I understand that motive, since many grant proposals/papers in applied fields such as the medical sciences are rejected due to the overly complicated analysis sections that the reviewers have trouble understanding (let alone understanding the need for).




    -------------------------------------------
    Jonathan Moscovici
    -------------------------------------------








  • 10.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 16:15
    I was not referring to a computationally-intensive analysis method, such as bootstrapping, although it could be that. I was referring to whatever CI method you are going to use for analysis. If it is based in traditional math-stat-land, then you can investigate its properties when used with population distributions that do not meet its underlying assumptions. Many modern research proposals have statistical planning work (e.g., sample-size analyses) that is 100% based on Monte Carlo simulations. In fact, I cannot imagine a Bayesian adaptive clinical trial being planned without such work.

    Allow me a bit more rhetoric. I will never understand why non-statistician investigators routinely pressure professional statisticians to use out-dated or pedestrian approaches when there exist superior approaches, no more costly but yet are unfamiliar to most non-statisticians. Moreover, many statisticians immediately succumb to such pressures and thus resign themselves to doing inferior work. I hope that no physician in my life is of the type that succumbs to patients who pressure them to use older, inferior diagnostic tests and treatments.

    It reminds me of a provocative piece that I read while in grad school: Bross, I. D. J. (1974). The role of the statistician: Scientist or shoe clerk. The American Statistician, 28(4):126-127. Sadly, nearly 40 years later, it is still timely.


    -------------------------------------------
    Ralph O'Brien
    Case Western Reserve University
    -------------------------------------------








  • 11.  RE:intraclass correlation (ICC) testing?

    Posted 08-17-2012 16:31
    Thanks for a great article. I find that many of the same issues arise in collaborative efforts in educational research.

    -------------------------------------------
    Catherine Trapani
    Fordham University, Psychometrics Program
    -------------------------------------------








  • 12.  RE:intraclass correlation (ICC) testing?

    Posted 08-18-2012 10:43
    Hi again,

    Thanks for the article! It was a great read, and made some valid points.

    Regarding the Monte Carlo simulations, I must admit that aside from bootstrapping variance estimates for the ICC to obtain a boostrap CI, I am not entirely sure what was intended.

    Furthermore, this might only be temporary amnesia, but in either case I also cannot recall how to easily simulate this 5-point scale data from a 2-way random effects model (in this case, an ordered multinomial mixed effects model, then).




    -------------------------------------------
    Jonathan Moscovici
    -------------------------------------------








  • 13.  RE:intraclass correlation (ICC) testing?

    Posted 08-15-2012 22:30
    For something like an ICC, testing versus a null hypothesis doesn't make much sense. With any reasonable sample size, you'll end up rejecting, even for trivially small ICCs. A better approach is to place a confidence interval (CI) around the ICC. It's not how far away the ICC is from zero that matters, but how close it is to 1.0.

    So design your study so that the ICC has a reasonably narrow CI. The formulas for the confidence interval for an ICC are fairly well known. The width does change as the ICC changes, so make sure you have adequate precision for a small ICC (around 0.3, say), a moderate ICC (maybe around 0.5 or 0.6) and for a large ICC (around  0.8 perhaps).

    -------------------------------------------
    Stephen Simon
    Independent Statistical Consultant
    P. Mean Consulting
    -------------------------------------------