ASA Connect

 View Only
  • 1.  The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-22-2021 13:38
    Hi everyone, 

    In an unrelated matter, I must confess that I am stumped by how to formulate hypotheses for a directional Wilcoxon signed-rank test and how to interpret the ensuing results. 

    I am using R's function wilcox.test() function to analyze data collected in a before-after study.   The function syntax that I used is like this: 

    wilcox.test(x = before_scores, y = after_scores,
                alternative = "greater", mu = 0, paired = TRUE,
    exact = NULL, correct = TRUE, conf.int = TRUE, conf.level = 0.95)

    As an example:

    set.seed(146)

    before_scores <- rnorm(n = 10, mean = 20, sd = 3)
    after_scores <- rnorm(n = 10, mean = 24, sd = 3)

    wt <- wilcox.test(x = after_scores, y = before_scores,
    alternative = "greater", mu = 0, paired = TRUE,
    exact = NULL, correct = TRUE,
    conf.int = TRUE, conf.level = 0.95)

    wt

    > wt

    Wilcoxon signed rank exact test

    data: after_scores and before_scores
    V = 50, p-value = 0.009766

    alternative hypothesis: true location shift is greater than 0

    95 percent confidence interval:
    1.014856 Inf

    sample estimates:
    (pseudo)median
    4.010822

    In my mind, I thought I could set up the null and alternative hypotheses like this: 

    Ho: The after scores are NOT generally greater than the before scores.
    Ha: The after scores ARE generally greater than the before scores.

    so that, when I reject Ho, I could state that the data provide evidence that the after scores are generally greater than the before scores,

    But then I looked at R's help file for wilcox.test() and got all confused: 

    If only x is given, or if both x and y are given and paired is TRUE, a Wilcoxon signed rank test of the null that the distribution
    of 
    x (in the one sample case) or of x - y (in the paired two sample case) is symmetric about mu is performed.

    How do I translate this into directional hypotheses?  For non-directional hypotheses, I would state:

     Ho:  The distribution of after - before scores IS symmetric about 0;
     Ha:  The distribution of after - before scores IS NOT symmetric about 0. 

    But what I really want to know is whether there has been an improvement in before scores compared to after scores.  So, for directional hypotheses, can I state something like:

    Ho: The majority of after - before scores were less than or equal to 0.
    Ha: The majority of after - before scores were greater than 0 (aka the majority of after scores were an improvement over their before counterparts)?

    I am thinking that a directional Ha would state that the distribution of "after - before" differences is asymmetric about zero with the majority of that distribution sitting above 0 but I am not sure that is how a directional hypothesis for the Wilcoxon signed-rank test should actually be stated.  Any clarity on this would be greatly appreciated.  

    Many thanks, 

    Isabella

    ------------------------------
    Isabella R. Ghement, Ph.D.
    Ghement Statistical Consulting Company Ltd.
    E-mail: isabella@ghement.ca
    Phone: 604-767-1250
    ------------------------------


  • 2.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-23-2021 15:21
    I think your interpretation (the last block of text in pink) is an appropriate statement of null and alternative hypotheses for this test.

    ------------------------------
    John Kolassa
    Rutgers University
    ------------------------------



  • 3.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-24-2021 06:48
    Yes, and I think if you swear on a copy of "Individual Comparisons by Ranking Methods" that you would not have rejected the null and claimed Ha' that the after - before scores were less than 0, you can halve the p-value returned by the function which is doing a two sided test.

    ------------------------------
    Tom Parke
    ------------------------------



  • 4.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-24-2021 08:14
    I believe this is a general source of confusion about hypothesis tests, not specific to the Wilcoxon test.

    For example, I have just been teaching a course in linear regression based on a different text from what I was using previously. For a one-sided test, this text would formulate H0 as "beta less than or equal to 0" and H1 as "beta greater than 0" where beta is the coefficient being tested (for example, in a simple y versus x regression, beta is the slope of the regression). The other way to write it is H0: beta=0 and the same H1. Which formulation more accurately represents our intentions when conducting such a test? For the purpose of constructing a test statistic with a predetermined rejection probability, the second way of formulating the problem is definitely simpler. That's why I prefer to teach it that way, but the first approach obviously has its adherents as well. I think this is just one of those cases where we should all accept that there isn't a universally agreed "right way to do it" and be adaptable to differences among textbooks and software packages.

    Going back to your specific question about Wilcoxon, it seems to me that by offering the options alternative = 'greater', alternative = 'less' or alternative = 'two-sided', R is adjusting the p-values appropriately. It does look to me that alternative = 'greater' is the correct statement for the testing situation you describe.

    ------------------------------
    Richard Smith
    University of North Carolina
    ------------------------------



  • 5.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-24-2021 08:26
    Sorry, but I do not agree with the "majority" interpretation.  That sounds more like a sign test.  Wilcoxon can be significant if the number of positive and negative changes are the same, but the changes in one direction are bigger.

    Take this set of data and duplicate it 6 times (to get N = 60) and run it. I get p = 0.047.

    1.00 .00
    3.00 2.10
    4.00 3.30
    5.00 3.30
    6.00 4.70
    1.00 4.00
    3.00 7.00
    4.00 8.50
    5.00 5.50
    6.00 11.00

    The first 5 are small decreases. The last 5 are large increases. 

    Even the two tailed case will require you to interpret the distribution of the asymmetry, if significant.  So for one tailed I might say that the distribution was asymmetrical, in the positive direction.

    Realistically, most people interpret it as indicating an average (maybe median, since it is nonparametric) change unless there is some reason to be picky.

    Ed

    ------------------------------
    Edward Gracely
    Drexel University
    ------------------------------



  • 6.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-24-2021 13:18
    I like John's response. I don't think there is any implication of symmetry in the shape of the distribution which would rule out your first set of hypotheses in pink.

    ------------------------------
    Jamis Perrett
    Bayer US- Crop Science
    ------------------------------



  • 7.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-24-2021 07:25
    I agree with this test (for paired differences) being pesky. I thinks it's annoying that the test is often wrongly described, especially in texts with the "cookbook" approach ("use this test, if your data are non-normal" - "this test is about differences in medians"). Ironically, Wilcoxon assumed a normal distribution of the differences in his original paper.

    I happened to spend (too) much time on the logic of this test in the past. Here's what I remember:

    The test statistic is based on an ordering of the absolute values of the differences. Then, in this ordering we replace the absolute values by + or - , depending on the sign of the differences. This results in a sequence of + and - signs. Now we assume that every possible sequence of + and - signs is equally likely (assumption A).

    After that, sums of signed ranks and p-values are calculated. The null hypothesis for the two-sided test is this assumption A, technically speaking.

    Which distributions satisfy assumption A? Clearly, continuous distributions that are symmetric around zero. (For discrete distributions with non-zero probability for 0, we would have to exclude the 0s from the data.) Are there other (asymmetric) distributions that satisfy assumption A? I don't know, probably not, but maybe one can construct some weird distribution with this property. Anyway, the usual formulation of the null hypothesis is therefore "Differences have a symmetric distribution with mean zero." Because of the symmetry, this is equal to "differences have a symmetric distribution with median zero".

    If the null is not true, this means "differences do not have a symmetric distribution with mean zero". What does this mean? It could be
    a) Differences have a symmetric distribution, but not around zero
    b) Differences have mean zero, but are not symmetric
    c) Differences have median zero, but are not symmetric
    d) Differences have mean different from zero and are not symmetric
    e) Differences have median different from zero and are not symmetric
    f) Differences have median different from zero and mean different from zero and are not symmetric.

    Pretty inconclusive. So we pretty much don't know anything, when the null is rejected.

    It gets worse, if we want to have an interpretation in the before - after framework. If we think that differences are not symmetric about zero, does this say something about "number of positive before - after comparisons"? I'd argue: nothing at all.

    Ok, so we could raise a flag: Only use this test, if we can assume (beyond a reasonable doubt) that differences are symmetric. Then the (additional) null hypothesis would simply be "differences have mean (or median) 0". If we reject the null, we conclude that the differences have a mean (and median) different from zero.

    Since the difference of means is the mean of the differences, this results in "mean before" and "mean after" being different. However, we cannot safely conclude that "median before" and "median after" are different, because the median of the differences is not equal to the difference of the medians.

    Unless we also assume that the before values and the after values are symmetrically distributed, which does not follow from the symmetry of the differences. Or unless we assume that the shapes of the distributions of the before values and the after values are exactly the same.

    These are a lot of assumptions for a test that is often sold with a tag "no assumptions needed".

    Now for the one-sided test. If we include symmetry of the differences in the null hypothesis, I'd argue again that a rejection of the null doesn't tell us anything useful.

    So we state the assumption "differences have a symmetric distribution" outside of the null. Then, the null might be "mean of differences < = 0". Rejection of the null then means "mean of differences > 0". Because of symmetry, this is equivalent to "median of differences > 0". Again, this is not equivalent to "difference of medians > 0".

    I'd agree with you that this means "the majority of after-scores were an improvement". But it all depends on the assumption of symmetry of differences, which is an assumption outside of the null.

    My own question is: if the data are symmetrically distributed, the sample mean approaches normality rather quickly. So why is Wilcoxon signed-rank test preferable to the t-test for moderate sample sizes? If the differences are non-symmetric, neither Wilcoxon signed-rank nor the t-test are appropriate. If the differences are symmetric, the sample mean is approximately normal in moderate sample sizes, so the t-test should be approximately fine. (And if the sample size is 5 or 10, should we really use inferential statistics at all?)

    ------------------------------
    Hans Kiesl
    Regensburg University of Applied Sciences
    Germany
    ------------------------------



  • 8.  RE: The pesky Wilcoxon signed-rank test and its interpretation

    Posted 06-25-2021 09:34
    In the spirit of estimation over testing, the sign test not sign-rank approach can be viewed as estimation:

    P(Y>X)-P(X>Y).  It is more robust as it makes no assumptions.  For continuous data, it is a binomial distribution over the outcomes 1 and -1.

    for the sign rank test, you can also do something analogous, beu dependencies come in

    You have  P(X>Y)-P(Y>X) is composed of two components:

    events 1:  Wj=1 if Yj>Xj  Wj=-1 if Yj<Xj

    Events 2: j NE K   Vjk=1 if Yj>Xk    Vij=-1 if Xk>Yj

    Now you have to take the variances and covariances into account.  

    It seems reasonable to presume P(Wj=1)=P(Vjk)=1.

    So you can work out the variances and covariances: The Wj are independent, and Wi is independent of  Vjk if i, j, k are distinct.
    Vjk and Vil are independent if there is no overlap in the indexes. you have to work out the 3 overlapping index covariances per Lehmann's book.

    Asymptotic normality needs relatively small sample sizes.

    For those liking hypothesis tests, its null is expressed in the probability y>X=Probability X>Y, no symmetry required,


    ------------------------------
    Jonathan Shuster
    ------------------------------