First of all, the use of a sequence of p-values as test cut-offs in a group sequential design was, and still is, motivated by the desire to maintain an upper bound on the overall false positive probability with repeated testing, and the particular numerical p-value cut-offs depend on the particular design. This leads to many silly things, since a given data set, either interim or final, can lead to different decisions, depending on how one generated the data and what one intended to do.
A simple, much cited example of this lunacy (Berger and Wolpert, 1984) is a dataset consisting of 9 responses in a sample of size 12, with the aim to perform a one-sided test of Pr(response) = theta = .50 versus theta > .50. The p-value = .0730 if the data were obtained by simply sampling 12 subjects, implying a binomial distribution, but the p-value = .0337 if the data were obtained by sampling until 3 failures (non-responses) were observed, since this implies a negative binomial distribution. If one has the data but does not know what was intended, one cannot compute a p-value. If, instead, one takes a Bayesian approach and assumes that theta follows a beta (.50, 50) prior, then one can compute the posterior probability Pr( theta > .50 | 9 responses in 12 subjects) = .96. Of course, a posterior 96% credible interval for theta is .471 to .924, so the data do not provide strong evidence of anything.
In your example, the people look at the interim data and decide to violate the design. This says that the actual design is something of the form "We will follow a formal group sequential design provided by a statistician, unless we don't like the decision that it dictates, in which case we will decide whatever we like." Additionally, it is very hard to interpret what actually is going on, since the data have been reduced to a single p-value, so a great deal of information is missing.
Given the above actual design, which allows the people to decide anything they like based on the interim data, computing or interpreting an empirical p-value is a fool's errand.
------------------------------
Peter Thall
Professor
Univ. of Texas-MD Anderson Cancer Center
Original Message:
Sent: 03-25-2016 08:50
From: George Skountrianos
Subject: Question regarding p-value interpretation in Group Sequential Designs
Hello everyone. I have a general question that I would appreciate any input on:
Suppose we design a typical group sequential design and to stop (for efficacy or futility) at the first interim analysis the result has to be significant at a nominal p-value of 0.002. Now further suppose that we observe a p-value of 0.025 at this stage. Two questions:
1. Per the GS boundaries we technically cannot stop but how would we interpret this observed p-value. You can imagine a project team asking "yes this result is not significant at the 0.002 level but why can't we say it is significant at the 0.025 level?"
2. If the project team chooses to stop at this stage due to reasons other that statistical (e.g. operationally not feasibility to continue the project) can we use the p-value of 0.025?
So my overall question is how do we interpret these observed interim p-values when they are between stopping bounds
Thank you!
George
------------------------------
George Skountrianos
Statistician
Hollister Incorporated
------------------------------