To help illustrate one way I'm thinking about this problem, below I repeat my previous example and add six versions of information we might have about the desired correlation. The first version is ideal, and the rest are more like the situation of interest to me -- especially Version 6. Some of the versions might seem quite contrived; they're in the spirit of a Bayesian prior. For each version -- that is, the previous example with certain additional information -- my question is, how can we use the given information to make inferences about the mean difference (e.g., hypothesis test, confidence or credible interval)? I'd appreciate any further ideas.
Previous example: Suppose 11 pairs of subjects contributed scores for samples A and B (e.g., the same subjects measured under conditions A and B, or subjects in group A matched with subjects from group B), but we're given only the samples' size (
n = 11), means (
MA = 5,
MB = 7), and variances (
VA = 12,
VB = 10). Note that the correlation between samples, say
r, is not available; let's assume that if it
were available a conventional related-samples
t test would be appropriate for inference about the mean difference, and let's use rho to denote that sample correlation's estimand.
Version 1: Suppose we also have the sample correlation for these data, say
r = .6. Then we can simply compute the difference scores' sample variance as
12 + 10 - 2(.6)SQRT(12*10) = 8.855
and, from this, the sample mean difference's standard error (SE) as SQRT(8.855 / 11) = 0.805; we can use this SE for inference based on a
t distribution with 10 degrees of freedom. This version is ideal but isn't the situation I'm interested in.
Version 2: Suppose we also know the correlation parameter, say rho = .6. How can we make the desired inferences using the focal sample's summary statistics together with rho? I suspect it's
not appropriate to simply plug this correlation parameter into the above expression for the focal sample's variance of difference scores and proceed with the usual
t-based inferences.
Version 3: Suppose we also have the sample correlation from a different, independent sample, and let's assume this sample correlation also estimates rho. How can we make the desired inferences using the focal sample's summary statistics together with this independent estimate of rho?
Version 4: Suppose we also believe that the sample correlation for these data is either
r = .5 (with probability .6) or
r = .8 (with probability .4). How can we make the desired inferences using the focal sample's summary statistics together with this two-point probability distribution on
r? Can we somehow use each correlation value to obtain SEs and then combine these over the two values -- like we might when combining results over multiple imputations in certain missing-data problems?
Version 5: Suppose we also believe that the correlation parameter is either rho = .5 (with probability .6) or rho = .8 (with probability .4). How can we make the desired inferences using the focal sample's summary statistics together with this two-point probability distribution on rho?
Version 6: Suppose we also believe that the correlation parameter follows a specific parametric distribution, such as rho ~ Beta(alpha, beta). How can we make the desired inferences using the focal sample's summary statistics together with this continuous probability distribution on rho?
-------------------------------------------
Adam Hafdahl
Statistical Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------
Original Message:
Sent: 04-01-2012 11:57
From: Richard Bittman
Subject: Comparing related samples' means without correlation
I like Nayak's response. I encounter this often when doing sample size calculations for change from baseline, and the only data are summaries in journal articles that provide the baseline mean and standard deviation, the endpoint mean and standard deviation but not the correlation between baseline and endpoint or the standard deviation of the subject-specific differences. There's no choice but to make some assumptions, and I often assume the correlation is +0.5. Sensitivity analyses can also help.
-------------------------------------------
Dick Bittman
President
Bittman Biostat, Inc.
-------------------------------------------
Original Message:
Sent: 04-01-2012 11:36
From: Michael Chernick
Subject: Comparing related samples' means without correlation
You can ignore my mention of meta analysis because most of my comments pertain to your situation. You say let's assume that a t test would be appropriate if we had the raw data. That is very hypothetical. Without the raw data how could you know that the t test would be appropriate? Nayak's suggestion would be okay if his assumption that the sum of the variances is much larger than twice the covariance. But if the correlation is high that would not be the case adn the test would be much too conservative.
I would not feel comfortable doing any kind of inference when looking at mean differences with only the mean and variance estimates specified. I would still feel uncomfortable if the estimate of covariance (or correlation) is also specified. Even though you can then estimate the variance for the mean difference I would not be sure that the t test would be appropriate without seeing the raw data.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 04-01-2012 11:21
From: Adam Hafdahl
Subject: Comparing related samples' means without correlation
As I said in my original post, let's assume that a conventional related-sample t test would be appropriate if we had the raw data or the correlation. In other words, for the question I'm asking let's ignore any "messiness" that might exist in the raw data, though that'd certainly be important to consider in a practical application. I'm ignoring this because I'd like to focus on solving the problem I posed.
Also, to reiterate, for the sake of this discussion I'm not interested in the meta-analytic aspect I mentioned in my original post. I'm familiar with several techniques for combining p values, but that's not pertinent to the question I posed (unless I'm missing something).
-------------------------------------------
Adam Hafdahl
Statistical Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------
Original Message:
Sent: 03-31-2012 22:22
From: Michael Chernick
Subject: Comparing related samples' means without correlation
A problem with doing inference when you only have summary statistics is that you can't check the distributional assumptions. Aside from that there is no way of knowing how conservative the recommended test is using an overestimated standard deviation. Meta analyses that combine p-values are common.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 03-31-2012 22:06
From: Nayak Polissar
Subject: Comparing related samples' means without correlation
Dear Adam,
The situation is not hopeless. You can get an approximate solution. If the two means for the paired data are Xbar1 and Xbar2 with variances V1 and V2, respectively, you can proceed as follows. The differences in means, Delta, is Delta = Xbar1-Xbar2 with variance:
Var(Delta) = V1 + V2 - 2*Cov(Xbar1,Xbar2). (Note that V1 is the variance of a mean, so the variance of the individual measurements should be divided by n. Same comment for V2.)
According to you we know V1 and V2, but we don't know the covariance. However, for many situations (e.g., people tested twice on the same test) the covariance is positive. In that case V1 + V2 is a conservatively large variance for Delta, and t = (Xbar1-Xbar2)/[sqrt(V1+V2)] is a conservatively small magnitude of t, which you can look up in a t table with the appropriate degrees of freedom.
Alternatively, if you are not prepared to assume that the first measurement and the second measurement are positively correlated, you could do a sensitively analysis by varying the value of the correlation (and implied covariance) across a plausible range to see what the impact is on your inference. I did not write out all the statistical algebra for these alternative calculations, but it is not difficult.
It is important to note that the observed difference, Delta = Xbar1-Xbar2, is your estimate of the difference, so if you are pooling things across studies, as in meta-analysis, you can proceed with that value. If you use these conservatively large variances for studies where you do not have the SD or variance of the differences, then you final conclusions will be conservative (i.e., your p-values will be too big), but you will have an appropriate pooled effect size.
By the way, I would be surprised if the population covariance (and correlation) between the two measurements is not positive for many, many situations where people are measured twice.
Is this helpful? What else do we have to do on a Saturday afternoon!
Best wishes,
Nayak
-------------------------------------------
Nayak Polissar
Principal Statistician
The Mountain-Whisper-Light Statistics
-------------------------------------------
Original Message:
Sent: 03-31-2012 18:33
From: Adam Hafdahl
Subject: Comparing related samples' means without correlation
I suspect you misunderstood my question, Michael, perhaps because I wasn't clear about the setup. I'll reiterate a key point: Suppose we don't have raw data from the paired samples; we have only the sample size (common to both samples) and each sample's mean and variance. So, we can't compute the paired differences' variance directly. Does that make sense? (With a binary outcome, where we might use McNemar's test, suppose we knew the 2 [marginal] success proportions but none of the 2 x 2 table's counts.)
Here's an example with a continuous outcome: Suppose 11 pairs of subjects contributed scores for samples A and B (e.g., the same subjects measured under conditions A and B, or subjects in group A matched with subjects from group B), but we're given only the samples' size (n = 11), means (M_A = 5, M_B = 7), and variances (V_A = 12, V_B = 10). My question is, how can we use these descriptive stats -- without this sample's raw scores or correlation -- to make inferences about the mean difference?
This might seem like an unusual situation, but it's not uncommon in meta-analyses of literature/aggregate data, where we have to rely on whatever's reported in a primary study. In some research domains authors rarely report the correlation between related samples' scores.
-------------------------------------------
Adam Hafdahl
Statistical Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------
Original Message:
Sent: 03-31-2012 14:45
From: Michael Chernick
Subject: Comparing related samples' means without correlation
I would think you would know about paired tests. So my comment here my be too elementary and not to your point. There are tests that can be conducted when variables are correlated without having to know or estimate the correlation coefficient. The paired t etst and McNemar's test are examples. In the case of the paired t tests the individual paired observations are correlated but their paired differences are independent and identically distributed. So taking the average of the paired difference allows one to apply the one sample t test that of the null hypothesis that the mean difference is zero. Of course the sample size has to be the same for both groups. In the case of unequal sample size one could take the smaller of the two sample sizes and construct pairs by some sort of matching method. The paired t test could then be applied if you are willing to assume that the paired differences have the same variance.
With the paired t test it is true that the variance of the difference is the sum of the individual variances minus twice the covariance. But it does not need to be estimated from estimates of the component variances and the covariance. It is estimated directly as by the sample estimate of the variance of the paired differences.
-------------------------------------------
Michael Chernick
Director of Biostatistical Services
Lankenau Institute for Medical Research
-------------------------------------------
Original Message:
Sent: 03-31-2012 13:57
From: Adam Hafdahl
Subject: Comparing related samples' means without correlation
I'd appreciate any thoughts about the following inference problem, which is a simplified version of a clients' considerably more complicated actual problem.
Suppose we'd like to make statistical inferences about the difference between two means based on data from related samples (e.g., the same subjects measured in 2 conditions), and we have the sample size and both sample means and variances but not the sample correlation. Let's assume that if we had the sample correlation a conventional related-samples t test would be appropriate.
Is there a way to test the mean difference or construct a confidence (or credible) interval for it by either putting a prior on the correlation (parameter), using other info about the correlation (e.g., another sample's estimate), both, or something else? Although I'm curious about a fully Bayesian approach, I'd prefer a classical/frequentist strategy, which I think might be easier to extend to the more complicated actual problem (and explain to my client). For instance, can we obtain a reasonable standard error for the mean difference that incorporates uncertainty about the correlation?
A note on the client's actual problem: It's a meta-analysis in which each of relatively few studies contributes several multiple-endpoint comparisons between pretest-adjusted posttest means, and we have scant info about correlations between endpoints or between pre- and posttest measurements. I mention this only as context, not to solicit advice about it.
Cheers,
AH
-------------------------------------------
Adam Hafdahl
Statistical Consultant
ARCH Statistical Consulting, LLC
-------------------------------------------