To help illustrate one way I'm thinking about this problem, below I repeat my previous example and add six versions of information we might have about the desired correlation. The first version is ideal, and the rest are more like the situation of interest to me -- especially Version 6. Some of the versions might seem quite contrived; they're in the spirit of a Bayesian prior. For each version -- that is, the previous example with certain additional information -- my question is, how can we use the given information to make inferences about the mean difference (e.g., hypothesis test, confidence or credible interval)? I'd appreciate any further ideas.
Previous example: Suppose 11 pairs of subjects contributed scores for samples A and B (e.g., the same subjects measured under conditions A and B, or subjects in group A matched with subjects from group B), but we're given only the samples' size (
n = 11), means (
MA = 5,
MB = 7), and variances (
VA = 12,
VB = 10). Note that the correlation between samples, say
r, is not available; let's assume that if it
were available a conventional related-samples
t test would be appropriate for inference about the mean difference, and let's use rho to denote that sample correlation's estimand.
Version 1: Suppose we also have the sample correlation for these data, say
r = .6. Then we can simply compute the difference scores' sample variance as
12 + 10 - 2(.6)SQRT(12*10) = 8.855
and, from this, the sample mean difference's standard error (SE) as SQRT(8.855 / 11) = 0.805; we can use this SE for inference based on a
t distribution with 10 degrees of freedom. This version is ideal but isn't the situation I'm interested in.
Version 2: Suppose we also know the correlation parameter, say rho = .6. How can we make the desired inferences using the focal sample's summary statistics together with rho? I suspect it's
not appropriate to simply plug this correlation parameter into the above expression for the focal sample's variance of difference scores and proceed with the usual
t-based inferences.
Version 3: Suppose we also have the sample correlation from a different, independent sample, and let's assume this sample correlation also estimates rho. How can we make the desired inferences using the focal sample's summary statistics together with this independent estimate of rho?
Version 4: Suppose we also believe that the sample correlation for these data is either
r = .5 (with probability .6) or
r = .8 (with probability .4). How can we make the desired inferences using the focal sample's summary statistics together with this two-point probability distribution on
r? Can we somehow use each correlation value to obtain SEs and then combine these over the two values -- like we might when combining results over multiple imputations in certain missing-data problems?
Version 5: Suppose we also believe that the correlation parameter is either rho = .5 (with probability .6) or rho = .8 (with probability .4). How can we make the desired inferences using the focal sample's summary statistics together with this two-point probability distribution on rho?
Version 6: Suppose we also believe that the correlation parameter follows a specific parametric distribution, such as rho ~ Beta(alpha, beta). How can we make the desired inferences using the focal sample's summary statistics together with this continuous probability distribution on rho?
-------------------------------------------
Adam Hafdahl
Statistical Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------
Original Message:
Sent: 04-01-2012 11:57
From: Richard Bittman
Subject: Comparing related samples' means without correlation
I like Nayak's response. I encounter this often when doing sample size calculations for change from baseline, and the only data are summaries in journal articles that provide the baseline mean and standard deviation, the endpoint mean and standard deviation but not the correlation between baseline and endpoint or the standard deviation of the subject-specific differences. There's no choice but to make some assumptions, and I often assume the correlation is +0.5. Sensitivity analyses can also help.
-------------------------------------------
Dick Bittman
President
Bittman Biostat, Inc.
-------------------------------------------
Original Message:
Sent: 04-01-2012 11:36
From: Michael Chernick
Subject: Comparing related samples' means without correlation
You can ignore my mention of meta analysis because most of my comments pertain to your situation. You say let's assume that a t test would be appropriate if we had the raw data. That is very hypothetical. Without the raw data how could you know that the t test would be appropriate? Nayak's suggestion would be okay if his assumption that the sum of the variances is much larger than twice the covariance. But if the correlation is high that would not be the case adn the test would be much too conservative.
I would not feel comfortable doing any kind of inference when looking at mean differences with only the mean and variance estimates specified. I would still feel uncomfortable if the estimate of covariance (or correlation) is also specified. Even though you can then estimate the variance for the mean difference I would not be sure that the t test would be appropriate without seeing the raw data.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 04-01-2012 11:21
From: Adam Hafdahl
Subject: Comparing related samples' means without correlation
As I said in my original post, let's assume that a conventional related-sample t test would be appropriate if we had the raw data or the correlation. In other words, for the question I'm asking let's ignore any "messiness" that might exist in the raw data, though that'd certainly be important to consider in a practical application. I'm ignoring this because I'd like to focus on solving the problem I posed.
Also, to reiterate, for the sake of this discussion I'm not interested in the meta-analytic aspect I mentioned in my original post. I'm familiar with several techniques for combining p values, but that's not pertinent to the question I posed (unless I'm missing something).
-------------------------------------------
Adam Hafdahl
Statistical Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------
Original Message:
Sent: 03-31-2012 22:22
From: Michael Chernick
Subject: Comparing related samples' means without correlation
A problem with doing inference when you only have summary statistics is that you can't check the distributional assumptions. Aside from that there is no way of knowing how conservative the recommended test is using an overestimated standard deviation. Meta analyses that combine p-values are common.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------