Discussion: View Thread

Comparing related samples' means without correlation

  • 1.  Comparing related samples' means without correlation

    Posted 03-31-2012 13:57
    I'd appreciate any thoughts about the following inference problem, which is a simplified version of a clients' considerably more complicated actual problem.

    Suppose we'd like to make statistical inferences about the difference between two means based on data from related samples (e.g., the same subjects measured in 2 conditions), and we have the sample size and both sample means and variances but not the sample correlation.  Let's assume that if we had the sample correlation a conventional related-samples t test would be appropriate.

    Is there a way to test the mean difference or construct a confidence (or credible) interval for it by either putting a prior on the correlation (parameter), using other info about the correlation (e.g., another sample's estimate), both, or something else?  Although I'm curious about a fully Bayesian approach, I'd prefer a classical/frequentist strategy, which I think might be easier to extend to the more complicated actual problem (and explain to my client).  For instance, can we obtain a reasonable standard error for the mean difference that incorporates uncertainty about the correlation?

    A note on the client's actual problem: It's a meta-analysis in which each of relatively few studies contributes several multiple-endpoint comparisons between pretest-adjusted posttest means, and we have scant info about correlations between endpoints or between pre- and posttest measurements.  I mention this only as context, not to solicit advice about it.


    Cheers,

    AH

    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------


  • 2.  RE:Comparing related samples' means without correlation

    Posted 03-31-2012 14:46
    I would think you would know about paired tests.  So my comment here my be too elementary and not to your point.  There are tests that can be conducted when variables are correlated without having to know or estimate the correlation coefficient.  The paired t etst and McNemar's test are examples.  In the case of the paired t tests the individual paired observations are correlated but their paired differences are independent and identically distributed.  So taking the average of the paired difference allows one to apply the one sample t test that of the null hypothesis that the mean difference is zero.  Of course the sample size has to be the same for both groups.  In the case of unequal sample size one could take the smaller of the two sample sizes and construct pairs by some sort of matching method.  The paired t test could then be applied if you are willing to assume that the paired differences have the same variance.

    With the paired t test it is true that the variance of the difference is the sum of the individual variances minus twice the covariance.  But it does not need to be estimated from estimates of the component variances and the covariance. It is estimated directly as by the sample estimate of the variance of the paired differences.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 3.  RE:Comparing related samples' means without correlation

    Posted 03-31-2012 18:34
    I suspect you misunderstood my question, Michael, perhaps because I wasn't clear about the setup.  I'll reiterate a key point: Suppose we don't have raw data from the paired samples; we have only the sample size (common to both samples) and each sample's mean and variance.  So, we can't compute the paired differences' variance directly. Does that make sense?  (With a binary outcome, where we might use McNemar's test, suppose we knew the 2 [marginal] success proportions but none of the 2 x 2 table's counts.)

    Here's an example with a continuous outcome: Suppose 11 pairs of subjects contributed scores for samples A and B (e.g., the same subjects measured under conditions A and B, or subjects in group A matched with subjects from group B), but we're given only the samples' size (n = 11), means (M_A = 5, M_B = 7), and variances (V_A = 12, V_B = 10).  My question is, how can we use these descriptive stats -- without this sample's raw scores or correlation -- to make inferences about the mean difference?

    This might seem like an unusual situation, but it's not uncommon in meta-analyses of literature/aggregate data, where we have to rely on whatever's reported in a primary study.  In some research domains authors rarely report the correlation between related samples' scores.


    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 4.  RE:Comparing related samples' means without correlation

    Posted 03-31-2012 18:47
    This is not a situation where you can combine data. In such situations the individual studies usually report p-values. You can then do a p-value combination tests. Fisher's combination method is one that you can use. Without the raw data there is not much more that you can do.


    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 5.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 11:13
    Let's forget I mentioned "meta-analysis," because that seems to be distracting you (Michael) from my main question, which is simply about one study's data.  What I'm really interested in is the problem I posed: how to make inferences about a difference between paired samples' means when we have only the sample size, means, and variances but not the raw data or correlation.


    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 6.  RE:Comparing related samples' means without correlation

    Posted 03-31-2012 22:07
    Dear Adam,

    The situation is not hopeless. You can get an approximate solution. If the two means for the paired data are Xbar1 and Xbar2 with variances V1 and V2, respectively, you can proceed as follows. The differences in means, Delta, is Delta = Xbar1-Xbar2 with variance:

    Var(Delta) = V1 + V2 - 2*Cov(Xbar1,Xbar2). (Note that V1 is the variance of a mean, so the variance of the individual measurements should be divided by n. Same comment for V2.)

    According to you we know V1 and V2, but we don't know the covariance. However, for many situations (e.g., people tested twice on the same test) the covariance is positive. In that case V1 + V2 is a conservatively large variance for Delta, and t = (Xbar1-Xbar2)/[sqrt(V1+V2)] is a conservatively small magnitude of t, which you can look up in a t table with the appropriate degrees of freedom. 

    Alternatively, if you are not prepared to assume that the first measurement and the second measurement are positively correlated, you could do a sensitively analysis by varying the value of the correlation (and implied covariance) across a plausible range to see what the impact is on your inference. I did not write out all the statistical algebra for these alternative calculations, but it is not difficult. 

    It is important to  note that the observed difference, Delta = Xbar1-Xbar2, is your estimate of the difference, so if you are pooling things across studies, as in meta-analysis, you can proceed with that value. If you use these conservatively large variances for studies where you do not have the SD or variance of the differences, then you final conclusions will be conservative (i.e., your p-values will be too big), but you will have an appropriate pooled effect size. 

    By the way, I would be surprised if the population covariance (and correlation) between the two measurements is not positive for many, many situations where people are measured twice. 

    Is this helpful? What else do we have to do on a Saturday afternoon!

    Best wishes,

    Nayak


    -------------------------------------------
    Nayak Polissar
    Principal Statistician
    The Mountain-Whisper-Light Statistics
    -------------------------------------------








  • 7.  RE:Comparing related samples' means without correlation

    Posted 03-31-2012 22:23
    A problem with doing inference when you only have summary statistics is that you can't check the distributional assumptions.  Aside from that there is no way of knowing how conservative the recommended test is using an overestimated standard deviation.  Meta analyses that combine p-values are common.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 8.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 11:21
    As I said in my original post, let's assume that a conventional related-sample t test would be appropriate if we had the raw data or the correlation.  In other words, for the question I'm asking let's ignore any "messiness" that might exist in the raw data, though that'd certainly be important to consider in a practical application.  I'm ignoring this because I'd like to focus on solving the problem I posed.

    Also, to reiterate, for the sake of this discussion I'm not interested in the meta-analytic aspect I mentioned in my original post.  I'm familiar with several techniques for combining p values, but that's not pertinent to the question I posed (unless I'm missing something).


    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 9.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 11:37

    You can ignore my mention of meta analysis because most of my comments pertain to your situation.  You say let's assume that a t test would be appropriate if we had the raw data.  That is very hypothetical.  Without the raw data how could you know that the t test would be appropriate?  Nayak's suggestion would be okay if his assumption that the sum of the variances is much larger than twice the covariance.  But if the correlation is high that would not be the case adn the test would be much too conservative.

    I would not feel comfortable doing any kind of inference when looking at mean differences with only the mean and variance estimates specified.  I would still feel uncomfortable if the estimate of covariance (or correlation) is also specified.  Even though you can then estimate the variance for the mean difference I would not be sure that the t test would be appropriate without seeing the raw data.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 10.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 11:58
    I like Nayak's response. I encounter this often when doing sample size calculations for change from baseline, and the only data are summaries in journal articles that provide the baseline mean and standard deviation, the endpoint mean and standard deviation but not the correlation between baseline and endpoint or the standard deviation of the subject-specific differences. There's no choice but to make some assumptions, and I often assume the correlation is +0.5. Sensitivity analyses can also help.

    -------------------------------------------
    Dick Bittman
    President
    Bittman Biostat, Inc.
    -------------------------------------------








  • 11.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 12:45
    To help illustrate one way I'm thinking about this problem, below I repeat my previous example and add six versions of information we might have about the desired correlation.  The first version is ideal, and the rest are more like the situation of interest to me -- especially Version 6.  Some of the versions might seem quite contrived; they're in the spirit of a Bayesian prior.  For each version -- that is, the previous example with certain additional information -- my question is, how can we use the given information to make inferences about the mean difference (e.g., hypothesis test, confidence or credible interval)?  I'd appreciate any further ideas.


    Previous example: Suppose 11 pairs of subjects contributed scores for samples A and B (e.g., the same subjects measured under conditions A and B, or subjects in group A matched with subjects from group B), but we're given only the samples' size (n = 11), means (MA = 5, MB = 7), and variances (VA = 12, VB = 10).  Note that the correlation between samples, say r, is not available; let's assume that if it were available a conventional related-samples t test would be appropriate for inference about the mean difference, and let's use rho to denote that sample correlation's estimand.


    Version 1: Suppose we also have the sample correlation for these data, say r = .6.  Then we can simply compute the difference scores' sample variance as

      12 + 10 - 2(.6)SQRT(12*10) = 8.855

    and, from this, the sample mean difference's standard error (SE) as SQRT(8.855 / 11) = 0.805; we can use this SE for inference based on a t distribution with 10 degrees of freedom.  This version is ideal but isn't the situation I'm interested in.


    Version 2: Suppose we also know the correlation parameter, say rho = .6.  How can we make the desired inferences using the focal sample's summary statistics together with rho?  I suspect it's not appropriate to simply plug this correlation parameter into the above expression for the focal sample's variance of difference scores and proceed with the usual t-based inferences.


    Version 3: Suppose we also have the sample correlation from a different, independent sample, and let's assume this sample correlation also estimates rho.  How can we make the desired inferences using the focal sample's summary statistics together with this independent estimate of rho?


    Version 4: Suppose we also believe that the sample correlation for these data is either r = .5 (with probability .6) or r = .8 (with probability .4).  How can we make the desired inferences using the focal sample's summary statistics together with this two-point probability distribution on r?  Can we somehow use each correlation value to obtain SEs and then combine these over the two values -- like we might when combining results over multiple imputations in certain missing-data problems?


    Version 5: Suppose we also believe that the correlation parameter is either rho = .5 (with probability .6) or rho = .8 (with probability .4).  How can we make the desired inferences using the focal sample's summary statistics together with this two-point probability distribution on rho?


    Version 6: Suppose we also believe that the correlation parameter follows a specific parametric distribution, such as rho ~ Beta(alpha, beta).  How can we make the desired inferences using the focal sample's summary statistics together with this continuous probability distribution on rho?



    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 12.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 14:44

    In cases 2 and 3 the t test might be a good way to do the frequentist inference.  For the t with 10 df to apply under the null hypothesis the mean different should have a normal distribution with mean 0 and the variance estimator should be an independent chi square with 10 degrees of freedom.  That would make the t exactly the correct test.  But in cases 2 and 3 the denominator is not exactly chi square 10.  Unless the observations are normally distributed for both groups the mean difference in the numerator will not be exactly normal either.  But I suspect in many instances the t would be a good approximation.  In the remaining scenarios you are putting a Bayesian prior on rho.  The full Bayesian approach would require a joint prior distribution on the two means, two variances and rho.  The inference would be governed by the joint posterior distribution for these parameters.  But that requires knowing the likelihood function which you can't do without the actual observations.  So there is no full Bayesian solution to the problem.

    In all cases a proper inference is not possible without making assumptions that you cannot verify from the available information.  The versions of the problem that you provide all look somewhat contrived.  It looks more like you have some theoretical interest here rather than a practical interest.  If you do have a practical problem in mind I do not understand why you want to force out a solution when there is insufficient information for pure frequentist or Bayesian inference.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 13.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 22:23
    Now I feel like we're getting somewhere, Michael.  I admit that I don't yet understand your remark about why a full Bayesian approach wouldn't work; it's not that I see how to implement such an approach, but that I suspect a Bayesian approach could handle missing data about the sample correlation/covariance.  I'm happy to avoid any Bayesian thinking for now, but I'd eventually like to understand your remark better.

    Can we focus for a moment on my Version 4, where besides having the two related samples' size (11), means (5 and 7), and variances (12 and 10), we believe the sample correlation is either r = .5 (with probability .6) or r = .8 (with probability .4)?  If I can figure out a principled way to handle that, then Versions 2, 5, and 6 should be straightforward.

    (Aside: The example and versions/cases I described are indeed contrived, because for now I'm trying to think clearly about a specific problem without being distracted by real-data messiness.  I hope readers will bear with me ... or ignore this thread. :)

    Here's a simple hypothesis-test strategy for Version 4 that's appealing but seems wrong, though I'm not sure why: Given r = .5 the usual related-samples t test yields t = 1.996 and 2-tailed p = .07389, and given r = .8 the same test yields t = 3.136 and 2-tailed p = .01057.  Each of these p values is a conditional probability, given a value of r.  If we trust our "prior" on r, then we can simply compute an "overall"/marginal p value as the conditional p value's expectation with respect to this prior -- by the law of total probability -- which here yields p = .6(.07389) + .4(.01057) = .04857.

    I can't yet give a compelling rationale for this strategy.  Meanwhile, here are some questions: If this strategy is valid, how exactly do we interpret the overall p value -- as a sort of hybrid Bayesian-frequentist probability?  If this strategy's not valid, why not, and is there a defensible way to combine t-test results or other quantities over the prior on r?  My intuition about the latter question is that we might be able to construct a reference distribution using some type of mixture distribution (e.g., with a different SE for each value of r).

    After finding or devising a valid hypothesis test in this situation, I'd like to understand how to construct an interval estimator for the mean difference, preferably without relying on the hypothesis test.  But for now I'd be happy to have a defensible test.



    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 14.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 07:14

    Adam:  I am not really interested enough in your problem to pursue this any further.  I actually thought I would stop participating in the discussion earlier.  But at least I want to address your question about the full Bayesian approach.  I am not saying that there is no way to construct some sort of Bayesian approach to your problem.  But the full Bayesian approach simply is to calculate a posterior distribution by multiplying prior by likelihood.  Without any raw data you just can't construct a likelihood function.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 15.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 11:43
    Adam,

    I think that your approach, while not truly Bayesian, makes perfect sense. The Bayesian approach treats quantities that we usually think of as fixed parameters as random variables, which is precisely what you are proposing. You are not updating your distribution of rho on the basis of observed data, which is why your proposed approach is not strictly speaking Bayesian (and why I would not refer to it a a "prior" distribution). But if the Bayesians can do what they do, then you should be able to do what you're suggesting.

    Of course, you will face the same challenge as do those who perform any Bayesian analysis, namely, how do you justify your probability distribution for rho?

    -- Tom

    -------------------------------------------
    Thomas Sexton
    Professor and Associate Dean
    Stony Brook University
    -------------------------------------------








  • 16.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 12:15
    The ordinary t-test is the same as the repeated measures t-test with a correlation of zero between the two measures.

    Can you argue on substantive grounds that the correlation between the repeated (i.e., condition, time, dyad member,etc.) measures cannot be negative? IFF so, then the ordinary t-test will give you a worst-case estimate. 

    If you can argue on substantive grounds that, say, a very plausible value for the the correlation is .6, then you get a very plausible estimate.

    You can then  present the worst case estimate and the  estimate bases on the highly plausible assumption.


    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 17.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 12:22
    Yes, but the key "if" here is as you say if "you can argue on substantive grounds"?  Adam has not yet given us substantive grounds for any of the key assumptions.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 18.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 13:57
    Yes I should have been more clear.

    By "IFF", I meant "If and only if...".

    I should have asked what the substantive nature of the work is, and what arguments there would be that
    1) the the correlation is not negative
     and
    2) what the basis would be for some particular correlation.

    -------------------------------------------
    Arthur Kendall
    Social Research Consultants
    -------------------------------------------








  • 19.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 11:15
    Thanks, Nayak.  Your comments about trying different values of the correlation/covariance between measurements are close to what I had in mind, and I understand quite well what you described about the relation between this correlation and the t test's p value.  (I've understood that for maybe 15 years. :)

    What I'd prefer, however, is essentially a way to formally incorporate such a sensitivity analysis into one inferential result for the mean difference (the estimand of what you called Delta), such as a p value or confidence/credibility interval.  Intuitively, I can imagine a simulation in which we try many many correlation values -- from a specified distribution of correlations -- and the desired result (e.g., p value) is combined appropriately over sampled values.  I just haven't figured out how to formalize this idea, and I'd like to know if there's a precedent in the stats literature.

    Here's a related idea: Is there some Bayesian-like way to obtain an "inflated" variance for the sample mean difference that incorporates our uncertainty about the correlation?  I can imagine, for instance, putting a prior (e.g., a Beta distribution) on the correlation parameter and essentially integrating over that prior to a credible interval for the mean difference.

    Finally, I realize I mentioned "meta-analysis" when presenting this problem, but I'm afraid that was an unnecessary distraction.  I'd rather just focus on solving the main problem I posed, for a single study with summary statistics from paired samples.


    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 20.  RE:Comparing related samples' means without correlation

    Posted 04-01-2012 16:28
    Have you considered a hierarchical approach to the data? You have information on the means and standard deviations of n studies and information about the correlation on a smaller number (m) of those studies. Assume that the n-m unknown correlations come from a population that can be reasonably characterized by the behavior of the m correlations. You might also see how the n standard deviations behave in the treatment group and the n standard deviations behave in the control group, and use that as a check. If the standard deviations are largely consistent, it would not be too outrageous to assume a comparable level of consistency in the correlations.

    You might even model this directly. Assume that the m 2x2 covariance matrices follow a hierarchical distribution and add in the information from the n-m diagonal entries for those studies where you don't know the off-diagonal entry.

    A couple other points. If you have the two means, the two standard deviations and something else, like a p-value or a confidence interval, you can back-calculate to get the correlation. Also, it is fairly standard practice these days to write to the individual investigators and request their raw data. They may or may not cooperate, but you should at least try.

    -------------------------------------------
    Stephen Simon
    Independent Statistical Consultant
    P. Mean Consulting
    -------------------------------------------








  • 21.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 09:46
    This one reminds me of a Fisher quote that I'll take slightly out of context, "To call in the statistician after....may be no more than asking him to perform a post-mortem...."

    -------------------------------------------
    Emil M Friedman, PhD
    emil.friedman@alum.mit.edu (forwards to day job)
    emilfrie@alumni.princeton.edu (home)
    http://www.statisticalconsulting.org
    -------------------------------------------








  • 22.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 10:41
    The Fisher post-mortem quotation seems apt.  Meta-analysts usually have to make the best of whatever results primary-study authors have chosen to report; contacting authors for better data is often disappointingly ineffective.  It's easy to throw one's hands up and declare the available data inadequate.  It's harder -- but perhaps more valuable for scientific progress -- to find creative ways to exploit the available data while admitting their limitations.

    Meta-analysts might find inspiration in this quotation from Jagger and Richards (1969):

        You can't always get what you want
        But if you try sometimes, well you might find
        You get what you need
        Oh yea-ay (hey-hey-hey, oooh)


    Cheers,

    Adam

    -------------------------------------------
    Adam Hafdahl
    Statistical Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 23.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 10:55
    I partially agree with Adam's comment.  If there are reasonable ways to do an analysis with partial information then it may be helpful.  But sometimes these creative ways may not be reasonable.  Then doing them can be misleading and worse than doing nothing.  Also i think this is a case where the data exists and he just doesn't have access to it.  I think the right way to proceed would be to make every possible effort to get the complete data first.  But if that can't be done don't play games by making some unverifiable assumptions just so that you can get "some" answer.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 24.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 12:00
    I largely agree with you, Michael.  In many of the dozen-ish published meta-analyses I've been involved with, the PI has requested better data from primary-study authors.  Sometimes that works; often it doesn't.  Besides my anecdotal claims, there's published evidence that obtaining results or raw data from authors is often attempted but is much less successful than meta-analysts would like:

        Gibson, C. A., Bailey, B. W., Carper, M. J., Lecheminant, J. D., Kirk, E. P., Huang, G., et al. (2006). Author contacts for retrieval of data for a meta-analysis on exercise and diet restriction. International Journal of Technology Assessment in Health Care, 22, 267-270.

        Mullan, R. J., Flynn, D. N., Carlberg, B., Tleyjeh, I. M., Kamath, C. C., LaBella, M. L., et al. (2009). Systematic reviewers commonly contact study authors but do so with limited rigor. Journal of Clinical Epidemiology, 62, 138-142.

    For decades research synthesists have lamented this state of affairs and called for better reporting and data-sharing practices.  Improvement is slow.

    On your point about "playing games" to get a result, I think it's quite difficult to make judgment calls about when the available data are so poor -- and required assumptions so untenable -- that no analysis is better than a crude analysis.  There's a lot of grey area between a pristine analysis and utter crap.  This isn't unique to meta-analysis, though these problems might be exacerbated when we don't have access to raw data.  We statisticians nearly always work with data that don't conform exactly to our statistical models, and our analyses nearly always involve assumptions that can't be verified with certainty.  As I see it, the most important principle is to be transparent about choices and procedures so potential critics or skeptics can evaluate what was done and others can replicate the study.



    -------------------------------------------
    Adam Hafdahl
    Owner & Principal Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 25.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 12:19

    Your points are well-taken.  The method of analysis should be transparent and at least the reader will know how the results are limited by the assumptions.  Now no assumption is fully verifiable.  But if you have the raw data you can check whether or not the data appear that they could have come from a normal distribution.  You can also obtain an unbiased estimate the variance of the mean distance when all the raw data is available.  Also you construct priors to represent what you think the correlation might be.  But do you really have a rasoanble idea as to what it might be without seeing the data.  Since you are looking at differences with respect to the same patients I would expect the correlation to be positive but not know much more than that.  When this type of information is not available for a meta-analysis I would much prefer just combining p-values (authors almost always report the p-value for the test).  What you are doing by speculating on correlations and that a t test would be appropriate when you do not have data to support these assumptions at all I think is wrong and is playing games to arrive at an answer.  You can always report the estimates, your computed confidence intervals and or p-values for the test(s) with the qualifying assumptions.  But why do it when it is so dubious?  At least it seems dubious to me.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 26.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 14:26
    Just four remarks for now, then I'll try to tear myself away from this thread for today:

    1. Reminder and Apology: The artificial problem/question I posed about comparing related samples' means was meant to help me understand certain aspects of this relatively simple problem that I could apply to my client's much more complicated actual problem.  If that simpler problem seems artificial or my discussion of it seems unusual, that may be partly because I'm trying to steer the conversation toward things that will be useful in the actual problem.  Sorry if that's awkward or if it's bothersome to work within the constraints of the simpler problem I posed, but I'm not inclined to discuss messy details of the actual problem.

    2. Evidence About Correlations: In the client's actual problem we do have data to inform some of our choices.  For instance, we'll probably be able to obtain at least some correlations between endpoints (i.e., outcome variables or occasions) for some studies, and the client anticipates being able to obtain raw data from at least one study.  So, although I share concerns expressed by Michael and others (Tom? Art?) about justifying choices for these correlations, for the sake of discussion I've tried to simplify matters by supposing we have some empirical basis for making choices like the scenarios I described in my six versions of correlation information.

    3. Combing P Values: I'm familiar with methods for combining p values, but I doubt they'll be helpful in my client's actual problem.  I won't elaborate on this, except to say that (a) most methods for combining p values require assumptions that are pretty likely violated in our situation (e.g., independence, homogeneity), and (b) combining p values won't get us some of the meta-analytic results we're interested in (e.g., estimates of within-study and between-studies heterogeneity).  That's why I'm largely ignoring suggestions to use methods for combining p values.

    4. Standard Error for Mean Difference: In the simpler problem/question I posed, about comparing related samples' means, I've focused so far on testing the mean difference.  I'd hoped discussing that specific task would give me insights about how to make defensible inferences by incorporating certain types of information about the correlation between samples' scores.  I'm perhaps more interested, however, in obtaining a standard error (SE) for the estimated mean difference that appropriately incorporates any information we have about the correlation.  Understanding that would help me devise strategies for the messier actual problem (i.e., covariance matrix for each study's vector of multiple-endpoint estimates).  If I can make time, I might post a slightly modified version of my simple problem that refocuses on this SE.



    -------------------------------------------
    Adam Hafdahl
    Owner & Principal Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 27.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 15:06
    My comments about combining p-values was made because that is the only meta-analytic approach I would do in the absence of the raw data.  Also the only restriction I know of for the p-value combination is the independence of the studies and similarity of the endpoints.  I think that would be required for almost any meta-analysis.  If Adam can't be forthcoming about the real problem then i think he is wasting peoples time because we don't have a clear idea of what data is available and what isn't or how the planned meta-analysis is related to correlations between variables that are differenced in his artificial problems.
    I will drop out of the discussion with the note that I still think that one needs to be very careful about making unverifiable assumptions that could have a major impact on results and conclusions of a study.  Just because a client wants a certain kind of analysis doesn't mean that the statistician should come up with a clever but probably invalid approach.  Do only what is reasonable for the data at hand.  Use prior assumptions only if you have a sound basis for them.

    Also there seems to be a lot of attention to estimating the unknown correlations.  As I said before, if you have the observed paired differences you don't even need to calculate correlations as a step in finding the standard error for the mean difference.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 28.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 16:54
    Michael,

    I genuinely appreciate your numerous contributions to this thread and to this forum (and others) more generally.  I must confess, however, that I sometimes find the content of your messages obscured by your occasionally off-putting tone.  I try to give you the benefit of the doubt and assume that you don't intend to be condescending or disrespectful, but occasionally that assumption seems unwarranted.

    Here's a good example from your most recent post: "If Adam can't be forthcoming about the real problem then i think he is wasting peoples time ... ."  I've explained repeatedly why I deliberately posed a question that's much simpler than the client's actual problem: I didn't mean to be deceptive; I was trying to focus the discussion on specific issues I'd like to understand so I can address the much more complicated actual problem myself.  In my judgment, posing this simpler problem was a more efficient way to accomplish my purpose than delving into numerous messy complications in the client's real problem.  That you (evidently) disagree is understandable, given our different purposes for engaging in this discussion.

    As I see it, no one's obliged to reply to my or anyone else's posts on this forum.  If anyone feels s/he's wasted time by reading, contemplating, or responding to any of my posts, I'd recommend that s/he reevaluate why s/he bothered to pay attention to the post in the first place.  I'm grateful when someone offers thoughts on questions I raise, and I especially value efforts to work within any constraints I've specified -- artificial as they might seem.  When others post something of interest to me, I'll try to extend them the same courtesy.


    Cheers,

    Adam

    -------------------------------------------
    Adam Hafdahl
    Owner & Principal Consultant
    ARCH Statistical Consulting, LLC
    -------------------------------------------








  • 29.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 15:40
    Adam,


    If your observations are independent then 'the estimated variance if one knew the sample covariance' is always less than or equal to 2 time the sum of the two sample variances, so using the square root of 2 times the sum in the denominator of the t test give a totally defensible conservative test. 

    If you have two possible values for r with two probabilities that sum to 1, you would combine the information by calculating the estimated variance for both values of r, find the sum weighted by the probabilities, then do the test using the weighted sum for the estimate of the variance of the difference.  


    Margot Tollesfon 

    -------------------------------------------
    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------








  • 30.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 15:55
    Everything you say is true but a concern with a defensibly conservative test is how conservative is it?  We don't know.  If it is very conservative it lacks power.  You get results but what do they really mean.  I suppose waiting the variance conditional on the correlation based on Bayesian prior distribution for the correlation parameter makes sense.  But everything really hinges on how reasonable the prior assumption about the correlation is.  Calculating a result is not the issue.  The issue is whether or not the approach is reasonable.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 31.  RE:Comparing related samples' means without correlation

    Posted 04-02-2012 16:12
    One last comment.  The test is conservative only if the correlation is positive.  of course one would do paired differencing to reduce variance when a positive corrlation is expected as is the case with two measurements on the same patient.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------