Discussion: View Thread

Does sample size really affect interpretation of p-values?

  • 1.  Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 10:24
    I attended a conference a few months ago.  During the discussion period a well-known statistician suggested that p-values are not as trustworthy when the sampel size is small and presumably the study is underpowered. Do you agree?  The way I see it a p-value of 0.01 for a sample size of 20 is the same as a p-value of 0.01 for a sample of size 500.  Since the p-value is the probability of seeing a value as extreme or more extreme as the observed test statistic when the null hypothesis is true, the sample size should make no difference.  The effect of sample size shows up in the difference between the null distributions for the two sample sizes.  I would like to hear other points of view.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------


  • 2.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 10:31
    I think it depends on what he or she meant by 'trustworthy'. If N is small, then a change of a single response could affect the p-value by a lot. If the results are not completely reliable (as in many fields) then perhaps saying the p-value is less "trustworthy" makes sense.



    -------------------------------------------
    Peter Flom
    -------------------------------------------








  • 3.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 12:42

    I was paraphrasing what was said.  I don't know what he said exactly.  So trustworthy was my term describing the conversation. I think the issue was sample size and not data reliability.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 4.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 13:31

    A p-value is a standardized version of the test statistic. The distribution of the p-value depends on n under the alternative hypothesis. The theorems on asymptotics (laws of large numbers, etc.) provide information regarding the impact of the sample size on the accuracy of the p-value for estimating its expected value. The question is the same as asking whether the sample size has an impact on the interpretability of the test statistic. I think that it has an impact on aspects of the test statistic, such as accuracy, but not interpretation.
    -------------------------------------------
    David Bristol
    Statistical Consulting Services
    -------------------------------------------








  • 5.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 14:35

    I guess I am rethinking why we are saying that the p-value estimated from the sample is not the true p-value.  Since the p-value depends on the observed value, if the assumed null distribution is exact and not asymptotic the estimate which is just an integral of the null distribution in the tail portion beyond the observed value, it is the actual p-value. The p-value has a uniform distribution under the null hypothesis (regardless of sample size) and as David points out under an alternative hypothesis its distribution changes with sample size as n changes.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 6.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 10:57
    There's a difference between the true p-value and the p-value that we estimate.  When the assumptions of the test are met, the estimated p-value would be close to the "true" p-value.  For many tests, the estimated p-value is far from being a true reflection of the probability that we are trying to estimate when the sample size is small.  So, I would tend to agree that the p-value is not so good for small sample sizes, unless one is using an exact test or a test appropriate for the sample size.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 7.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 12:05
    Please fit this into the discussion. We teach new students that the size of the confidence interval is indicative of the "confidence" in the "decision to reject", and that the sample size is key in determining the width. The p-value depends on the width of the confidence interval.

    -------------------------------------------
    Patrick Spagon
    -------------------------------------------








  • 8.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 13:00
    That is a very good point about estimated vs actual p-value.  My point applies to the actual "unknown" p-value and not the estimated p-value.  Certainly the accuracy of the estimated p-value improves with increasing sample size. So I guess the point which is expressed well in this dialogue is that in small samples there is a greater chance that the true p-value could be grossly underestimated.  That point was not expressed in the discussion at the conference.

    I think this is an important point for statisticians to take note of, namely p-value estimates are more variable and less trustworthy in small samples.  The true p-value can more readily be highly over or underestimated in the sample when the sample size is small.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 9.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 15:43

    The p-value estimates the probability that the finding in the sample is simply due to chance, i.e., that we selected a non-representative sample from the universal set.  There is no "true" p-value, only a "true" difference in the universal set which we are estimating based on our sample.  I.e., our sample is taken from the universal set, and if we had the universal set available to us then the exact difference would be known, not estimated, and there would be no reason to calculate probability. 
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 10.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 16:35
    Edith: I think the distinction he is making is when an asymptotic distribution is used for the null distribution.  For example in the case of the difference of two means with "assumed" common standard deviation.  The pooled estimate of standard deviation is used.  Thn the test statistic has an asymptotic standard normal distribution under the null hypothesis and if this asymptotic distribution is used to compute ("estimate") the p-value we can call this p-value an estimated p-value.  Since the sample size affects the accuracy of the estimated standard deviation it affects the accuracy of the estimated p-value.  But the impact is less than what I thought initially.

    One of the reasons why I raised this question is because clients often come to me with results based on small samples.  The p-value indicates statistical significance but the referees complain about the small sample size.  I have tended to argue that the sample size is irrelevant and that the effect size was large enough to lead to a significant result with a small sample.  But if the p-value estimate is unreliable in small samples then the argument does not hold up.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 11.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 17:34
    I'll add my two cents worth. Most p-values from computer programs are based on th Central Limit Theorem and assume the sample size is large enough to assume the relevant elements of the test statistics are close enough to the normal distribution to give p-values close to the "true" p-values, which are the true probabilities of seeing a test statistic of the size observed or larger, smaller, or more extreme (depending on the null hypothesis) under the null hypothesis. So the true p-values only depend on the assumed null distribution, not any underlying true distribution. If the sample size is small enough so that the Central Limit Theorem does not hold, then the p-values can be way off. With small samples, calculating exact p-values using the null distribution is often not difficult. Hope this is helpful. Margot ------------------------------------------- Margot Tollefson Owner Vanward Statistical Consulting -------------------------------------------


  • 12.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 19:47

    I find the discussion somewhat confusing, possibly because my statistical training is from the social sciences where we tend to learn rules for analysis by rote rather than fully understanding them as a graduate of a statistics or mathematics department would.  From a materialist perspective, the thing is, the only p-value we have is the estimated one.  On a material basis, we just have the sample that we actually have and it has a p-value associated with it for the hypothesis being tested.  Whatever the true value is, we don't know because we are using a statistical method of analysis instead of an exact method.

    So, for small samples, social scientists (at least sociologists and economists) are warned to be wary of small samples.  For example, we have a rule of thumb never to analyze social data from a sample less than n=30 (while we are told that physical scientists can work with very small samples, such as n=12, because their data is much more definite than in the social sciences and the variability they deal with is often very small in relation to the sample means.  In running analysis on energy forecasts for an electric utility I have observed by running the identical analysis with different numbers of cases that if the sample size gets to be about 20 or below the R-squared is often ridiculously high but if you use a few thousands or tens of thousands of cases for the same analysis the R-squared is typically lower but realistic in a practical sense.  In our texts we are warned against "capitalizing on chance" by using small samples.

    Kahenman & Tversky in their formulation of the "Law of Small Numbers" (which is "big" in the social sciences, I suppose because it has to do with illusion and misperception in science, which is part of the subject matter of psychology and social psychology as well as of Social Studies of Science and Technology, an interdisciplinary area) show that small samples often give extreme results, one way or another.  Of course, this article is about belief (http://stats.org.uk/statistical-inference/TverskyKahneman1971.pdf) but they begin with a strong result from a small sample and then ask what size of sample would needed to replicate the result and suggest that a much larger sample would be required.  Cohen makes a similar point: "...sample reliability...always depends upon the size of the sample."  "The larger the sample size, other things being equal, the smaller the error and the greater the reliability and precision of the results." [where "reliability" is the same thing as "precision" or the closeness of the sample value to the relevant population value] -- Cohen, Jacob, Statistical Power Analysis for the Behavioral Sciences, Second Edition. Hillsdale, NJ: Lawrence Erlbaum Associates 1988, Pp. 6-7.

    So, if you have a very big "effect size" and low variation relative to the sample mean for the test, a small sample might be OK, as is often the case with nearly exact data in the physical sciences.  But, if you are working with social science data or data that is created by a combination of physical and social variables like household energy use (to use a furnace or AC unit at all will determine a large component of a household's use but people and their situations are highly variable from year to year and a death or birth in a family or a child becoming a teenager or a young adult leaving for college or the armed services can also have major impact), sample sizes should be larger in order to "average out" a host of sources of potentially strong variation.  At least, if an interesting result is found using a small sample the study should be replicated with more data to see if the apparent p-value and the demonstrated effect size are really there and are not artifacts of a small sample size. 
    -------------------------------------------
    Hugh Peach
    H. Gil Peach & Associates, LLC
    -------------------------------------------








  • 13.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 00:16

    I don't like putting rules of thumb on sample size based on the type of data (e.g. physical measurement vs social science data).  There is no mystery to what an adequate sample size is. It depends on the standard deviation of the test statistic.  Specifically for the mean of a single sample it is the population standard deviation s divided by the square root of the sample size n.  It is the variance for the sample observations that dictates what an adequate sample size is.  Any connection to the type of data is indirect and is due to differences in variability.

    The idea of true vs estimated p-value should not be confusing.  If we have an exact null distribution and accept the modeling assumptions that lead to it, the calculated p-value is the true p-value.  If we are using an asymptotic null distribution or do not believe the modeling assumptions to be completely correct the calculated p-value is an approximation to the unknown true p-value.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 14.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 00:32
    The p value is about making inferences from the sample to the population.  So if we obtain a very large number of samples, compute the statistic and corresponding p value for each sample you will have the "estimate" of the distribution of the p value for the statistic computed.  If we are comparing two treatments then this sampling distribution of the p value depends on the comparison, say of the two means.  If the two means are equal then the sampling distribution of the p value is uniform and if the means are not equal then the sampling distribution will look some what like that of a Beta skewed toward 0.  From this distribution we can determine the power.

    So back to the one sample and one p value.  The p value is the evidence you have from your sample that you will be incorrect if you conclude the null hypothesis to be false when if fact it is true.  Does sample size come into play?  It sure does as you will believe the evidence for or against the null hypothesis for larger samples sizes, but it is also related to the variability.  If the means are 40 and 50 and the standard deviation is .1, then I would have great confidence in my evidence with n=2 per group.  But if my standard deviation is 10 then I would not have great confidence in my decision with a small sample size.

    -------------------------------------------
    George Milliken
    -------------------------------------------








  • 15.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 09:19
    I disagree about the sample size statements here.  If the true means are 40 and 50 and both true distributions have a std dev of 0.1, then we have plenty of power and any p-value that would be obtained would be fine since the distribution of the p-value in this case will be a beta that is heavily skewed towards 0.  However, if we obtain two samples, each of size 2, with sample means of 40 and 50, and with a common sample std deviation of 0.1, then I have no confidence in the p-value.  The distribtuion of the variance is so "bad" that I don't know if the two variances are truly equal and I have no true data to be able to determine how well this assumption fits.  I could use the Satterthwaite based t-test, which will give me the same result as assuming the variances are equal, but because the distribution of the variances is so "poor" with small sample sizes, I would not believe the result.  I want sample sizes large enough so that I at least have distributions with means and variances that exist and an n=2 does not result in distirbutions with a variance.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 16.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 10:47

    A good article for discussing the distribution of the p-value is:

         Murdock, D, Tsai, Y, and Adcock, J (2008) _P-Values are Random
         Variables_. The American Statistician. (62) 242-245.

    Some simulations based on that paper can be easily done using the Pvalue.norm.sim and Pvalue.binom.sim functions in the TeachingDemos package for R.  Running these simulations can help visualize the fact that if the null is true and other assumptions are correct then the p-values follow a uniform distribution.  You can also see that if certain assumptions are wrong then the p-values will be biased away from 0.

    Another thing that can be seen that may apply more to the original question is that for the binomial with small sample sizes the number of possible p-values is fairly limited, for example with n=10 and testing that the true proportion is 0.5 you can get a p-value of about 0.0001 (10 successes), one of about 0.01 (9+ successes), and one about 0.051 (8+ successes), but nothing in between, so you will never see a p-value of 0.04 and your true probability of a type I error is much less than 0.05 if that is used as the alpha level.  As sample size increases the set of p-values fills in to look more continuous.  This may be part of what the professor meant.  




    -------------------------------------------
    Gregory Snow
    -------------------------------------------








  • 17.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 17:53
    The Tversky-Kahneman article was illuminating.  Thank you.  Interestingly, several of Tversky and Kahneman's points were also made by Peter H. Westfall and S. Stanley Young in their 1993 book, "Resampling-Based Multiple Testing" (John Wiley & Sons, Inc).  See the simulated multi-center oat-bran study in Section 1.1, pp4-9 of the book.  The context is that of multiple (simulated) researchers (ten of them) independently testing the same research hypothesis (that oat bran lowers cholesterol) on Treatment vs Control groups using 20 subjects per group per researcher.  In Westfall's and Young's simulation, the null hypothesis is true, yet two of the researchers obtain a statistically significant result in the desired direction...leading Westfall and Young to speculate that the two positive studies would have been published, which in turn would have led to additional studies trying to replicate them.  I think what's germaine here is that it is easy to see how the false-positive problem arises when one has ten different researchers independently conducting the same experiment, and easy to see that the truth does not change if nine of those ten researchers choose not to conduct it.    

    -------------------------------------------
    Eric Siegel
    Biostatistician
    Univ of Arkansas for Medical Sciences
    -------------------------------------------





  • 18.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 20:52
    I'm not sure that "the finding in the sample is simply due to chance" is the same thing as selecting "a non-representative sample from the universal set".

    One involves considering the probability of getting a result at least as extreme as the one you got, while the other implies perhaps more substantial (sampling) issues.

    -------------------------------------------
    Gabriel Farkas
    -------------------------------------------








  • 19.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 07:53
    Selecting a non-representative sample from the universal set is due to chance.

    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 20.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 12:40
    First, I agree with Edith's comment that there is no such thing as a 'true' p-value.  No statistics book ever posited such a thing.  There are population mean differences and there are population variances.  p-values are estimated from samples.  I'll come back to population values of p-values in a bit.

    Second, variability estimates for small samples ARE adjusted for by the t-test or other statistics.  Just look at the difference between the t-test critical values (when one estimates variability) and the z-score critical value (when the variability is known).  Using the ubiquitous 0.05 two-sided, the t-test has an asymptote at 1.96, which it approaches when N~30, with 6 d.f. the critical t-test is 2.45.

    Third, when N is small or large, a p=0.01 says that the means are quite likely to be different, the difference is not zero.  Both results indicate 'statistical significance'.  End of discussion.

    Fourth, there are (and I hate to be so pejorative) ignorant users of statistics who confuse p-values and effect size.  Effect size can be simply measured by (mean difference)/standard deviation.  It is a sufficient statistic in all power analyses, along with alpha and beta to estimate N.  Effect size is independent of sample size (unlike the standard error or p-values [see below]), although its confidence interval is directly affected by N.  Given a sample size of 20 or 500, we can ask a power analysis (using a power of 50%), what treatment difference would yield a significant (at 0.01 two-sided) result.  When the variances (sd) are set to a standardized 1.0, the mean difference (effect size) is 0.852 when N=20 and 0.163 when N=500.  What do we conclude?  When the sample size is small, the means would be almost a standard deviation different - a large treatment difference.  When the sample size is large, the means need only to be trivially different (0.16) to achieve statistical significance.

    Conclusion:  If two studies with different N's (N=20 or 500) had the sample p-values (0.01), the SMALLER study indicates a VERY LARGE CLINICAL DIFFERENCE, an important finding.  However, the CI on this difference will be quite large.  The larger study has a smaller CI, but indicates a trivial clinical difference.  I would tell my client that the small study indicates a real effect which might be quite large and worthy of future follow-up studies and the large study indicates a real effect, which is likely quite small and not worth further investing.  [Although if the small effect size is the ONLY treatment available, then I'd recommend they do a huge Phase IIIb trial.]

    Fifth:  To return to population estimates of p-values, we need to understand what we are talking about - the null hypothesis.  We all should know that the null hypothesis is that the treatment difference is zero (or mean difference minus a constant is zero).  The difference could be any statistic (e.g., distribution, variability), but I'll focus on mean differences.  Let me ask a general question:  Can anyone think of any research question which any scientist/research ever believed has an observed mean difference of EXACTLY zero?  Let me operationalize that, the observed difference in a huge, huge study observed a difference smaller than 1/10^1,000,000,000,000,000,000,000.  Let me further elaborate on 'huge', a study with not 500, not 10,000, not a million, billion, trillion, but even larger sample size (e.g., centillion).  While we act like we are testing if the difference is exactly zero, in practice no difference in treatments is EXACTLY zero.  Think of the number line, with it measuring the difference of two different variable treatments.  What is the likelihood that the difference of these two variables is ever a single point of zero?  It may be practically quite small but the difference between two variables is almost never a single value of zero.

    When the difference, albeit small, is unlikely to be exactly zero, then the t-test (hence p-values) is a function of (square root of) N/group.  Let me illustrate this with a very small effect size (0.10) with the common 2-sided 0.05 alpha level comparing two independent samples.  With a small sample size of 4, the p-value is 0.90.  When N/group=92 the p-value is 0.50, when N/group=771 the p-value is 0.05, when N/group=3,036 the p-value is 0.0001.  In sum, when any non-zero difference exists, p-values could be anything.  As N increases any non-zero difference will become statistically significant, using any level of significance (< 0.05, < 0.01, ... , < 0.000000000000001).

    Sixth:  p-values only answer if the difference is not zero.  It COMPLETELY ignores the most important question, what is the difference?  Is the difference (clinically) meaningful?  If one understands the metric, that answer can only be obtained from the CI of the difference.  We all should know that if the p-value is < 0.05 then the 95% CI will not include zero.  The CI is how we understand the magnitude of the difference, NOT THE P-VALUE.  If we don't understand the metric of the parameter, I recommend computing the CI on the effect size (using the non-centrality parameter).

    -------------------------------------------
    Allen Fleishman
    Allen Fleishman Biostatistics Inc.
    -------------------------------------------



  • 21.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 13:05

    I am in general agreement with Allen's comment.  I meant to express the same idea, except using different words.  The p-value would not be a useful measure of statistical significance if its validity depended on the sample size.  In fact, statistical tests are constructed in such as way as to only produce a significant p-value when the observed effect is large enough given the size of the sample (the tests are adjusted for sample size).  I.e., in a small sample the statistical power for detecting a significant difference will be proportionately lower, making it proportinately less likely to detect a small difference than when the sample is large.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 22.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 15:04

    I totally agree that a small effect size needs a large sample size to achieve 'statistical significance'. 

    I also agree that with a constant N (and experimental design), p-values are then (and only then) related to effect size.  Although, there are ways to influence the variability to increase the effect size.  For example, a contralateral study with 50 patients typically yields better effect sizes than a study with two independent samples of 25 patients each.

    However I would say that the effect sizes are the thing we statisticians are given.  The correct method to compute power is to ask our clients to review their own literature and come up with the expected treatment difference(s).  I then compute the sample size to achieve an arbitrary p-value and power.  The worst clients I have ask the question, 'If I only have a budget for x patients, what mean difference can I see?'

    I strongly disagree with 'The p-value would not be a useful measure of statistical significance if its validity depended on the sample size.'  Given any non-zero difference, p-values can take on almost any value (given granularity of data - e.g., the binomial as discussed earlier by Gregory Snow) as a direct function of N.  See my example of an effect size of 0.1.  To demonstrate that, use any non-zero effect size put in any alpha and beta (use 0.50 for just barely significant), and out will pop out a sample size to achieve that p-value.  p-values are a direct function of sample size, all other things constant.

    Finally, I do agree that the other assumptions of the model are appropriate, but often of trivial importance given a sound design.  For example, heteroscedacity has been PROVEN to be unrelated to alpha when the Ns of the two treatment are equal.  Or non-normality is irrelevant (from the central limit theorem) when the Ns are > 20.  There have been many early Monte Carlo studies demonstrating these observations.  I personally try to avoid the non-parametric testing simply because it precludes useful CI.  For example, when was the last time one did a Wilcoxon test and reported the CI for the difference in the mean ranks?  If you strongly disbelieve that your data are interval, one should not report means and their confidence intervals.  If you strongly disbelieve that your data are ordinal, one should not report medians and their CI.  Independence of the samples is the one assumption I personally feel is crucial (e.g., using a viable correlated error structure when one has repeated measurements).

    -------------------------------------------
    Allen Fleishman
    Allen Fleishman Biostatistics Inc.
    -------------------------------------------








  • 23.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 14:09
    Response to the 1st point: My point about "true" vs. "estimated" p-value was well summarized by others in that the p-value we calculate depends on the assumptions being met.  As such, statistics books do discuss the effects of deviations from assumptions on the properties of a test.  The question is whether the sample size "interacts" with these deviations from assumptions to cause our calculated p-value to deviate from the p-value one would obtain if the assumptions were in fact met.  We use many tests that are only asymptotically correct, and as such, sample size affects the accuracy based on the rate of convergence.  I'm not certain, but I also suspect assumptions such as independence of samples would play a similar role.  If samples are not independent then the "effective" sample size is reduced.  Reducing the sample size of a small study would have a larger "effect" than reducing the sample size of a large study (e.g.. N=500). 

    Response to the 2nd point: I agree that many tests, such as the t-test, do take into consideration the variability.  However, conducting a simple t-test does not mean you get a good p-value.  An example of a two-sample t-test makes my point well.  One has to either assume that the population variances are equal or not.  In reality the population variances may be reasonably close to one another that the assumption of homoscedasticity has no appreciable effect on the distribution of the test statistic.  However, in reality, the variances my not be very similar and the assumption of homoscedasticity would have a large effect on the distribution of the test statistic.  Many recommend always conducting a Welch two-sample t-test since the "penalty" in degrees of freedom of this approach takes into account the difference in the estimated variances.  Now, here is where I think p-values for small samples will be less reliable.  If we knew that the two populations were normally distributed with the variances equal, then we would get a reasonable p-value when the sample size is small (e.g., N<10) regardless of the test we choose.  However, if the two distributions are not normally distributed and the variances are not equal, then our t-test will perform fairly poorly when the sample size is small regardless of the version of the t-test we use.  Remember that a the test statistic only has a t-distribution when the data are normal since the denominator is only chi-squared when the data are normal. 

    Response to 3rd point:  I agree the the meaning of the p-value is the same regardless of the sample size.  The question is not whether it has the same meaning, but whether the p-value we calculate is accurately a reflection of the probability we want. 

    Response to points 3-6: The meaning of a p-value is the probability that IF the null is true (e.g., mean 1 = mean 2) that we would obtain the observed difference or a difference more extreme.  This meaning does not depend on whether the null IS true.  The distribution of the p-value, however, does depend on whether the null vs alternate is true.  The question that is relevant for sample size is whether the distribution of the p-value for a given statistic is (close to) uniform when the assumptions are not met.  Further, we also want the distribution of the p-value to be a beta distribution when the alternate is true when the assumptions are met.  If these distributions do not hold, then the p-value that one calculates is not reliable.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 24.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 16:34

    Robert, I think we are talking about two different things: sample size vs. assumptions of the statistical tests.  If one violates the assumptions of the statistical test, the p-value will be distorted.  However, if one does not violate the assumptions of the statistical test, the sample size should not affect the validity of the p-value.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 25.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-02-2012 21:16
    I would agree if the impact of the deviation from assumptions was the same for all sample sizes.  However, I don't think the impact is the same for all sample sizes.  The overall point that I am making is exactly this. We have no real way to know how well our assumptions hold, especially with small sample sizes, and it is when we have small sample sizes that the p-values that we calculate are more likely to deviate from the "ideal" because of the impact of deviations from assumptions. 
    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 26.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 22:49

    Good job, Allen!
    Power, significance, and sample size are all related.  There have been a lot of articles written about the issue of reporting statistical significance in journal articles both in medicine and the social sciences, for the reasons Allen points out.  A small, meaningless effect can be highly significant when the sample size is large.  On the other hand, a very small sample has low probability to reject a false null hypothesis, resulting in Type II errors.  For that reason there is a push in the research literature to include effect sizes and confidence intervals in addition to p-values.  If you do a search you will find many interesting articles about statistical significance and the problems of reporting only p-values.

    -------------------------------------------
    Nora Galambos
    Stony Brook University
    -------------------------------------------








  • 27.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 07:59

    The discussion on p-value interpretation has been interesting and turned into a lot more than I expected when I read Michael Chernick's initial post. I don't have anything more to add about p-values, and thanks to everyone who posted. I did want to share an interesting coincidence. Hugh Peach's post on Sunday referred to a 1971 Tversky and Kahneman paper. Earlier that day I was watching Fareed Zakaria's show on CNN. His guest was the same Daniel Kahneman, a psychologist and Nobel Prize recipient (for economics in 2002, shared with Tversky), and the discussion was on his new book, Thinking, Fast and Slow. (See http://www.nytimes.com/2011/11/27/books/review/thinking-fast-and-slow-by-daniel-kahneman-book-review.html?pagewanted=all.) For everyone's amusement, here's a question Kahneman has used in lectures and Sunday on CNN: A bat and a ball together cost $1.10. The bat costs a dollar more than the ball. How much does the ball cost?

    -------------------------------------------
    Richard Bittman
    President
    Bittman Biostat, Inc.
    -------------------------------------------








  • 28.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 08:08
    Richard asked:
    A bat and a ball together cost $1.10. The bat costs a dollar more than the ball. How much does the ball cost?

    I assume the fast answer is  -- ball costs 10 cents, which is wrong, and the slow answer is  -- ball costs 5 cents?
    -------------------------------------------
    Susan Hilsenbeck
    Baylor College of Medicine
    -------------------------------------------








  • 29.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 09:48
    Good Susan, I believe you got the right answer.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 30.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 09:56
    I get teh correct answer by solving two linear equations in two unknowns.

    x+y =1.10
    x-y =1.00
    implies 2y = 0.10 so y=0.05.

    Is this your slow way or do you work it out mentally?

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 31.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 10:27
    I worked it out mentally, but fast. And I got the wrong answer. The other thing Kahneman said Sunday is if you don't work it out analytically (slow) at least check the result before declaring it, even better advice I think. As I recall, he said 85% of the MIT students given the question got it wrong.

    -------------------------------------------
    Richard Bittman
    President
    Bittman Biostat, Inc.
    -------------------------------------------








  • 32.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 11:57


    A small correction:  Amos Tversky passed away before the Nobel was awarded to Dan Kahneman.  Since Nobels are given only to the living, he did not technically share in it.  In a more substantive sense, though, Richard is correct in that everyone including Kahneman agrees it was really a joint award to the two of them as long-time collaborators.

    -------------------------------------------
    David Lyon
    Aurora Market Modeling, LLC
    -------------------------------------------








  • 33.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 15:35
    Robert, I don't believe that there is a "true" p-value vs. an estimated p-value.  The p-value is the probability that the observed different occurred due to chance.  The smaller the sample size the less likely it is that a difference that exists in the universal set will be detected in that sample.   In a large sample we are more likely to detect a small existing difference - simply because the larger the sample the more representative it is of the universal set. 

    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 34.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-01-2012 15:25


    I agree that the interpretation of a p-value is the same regardless of the sample size. If you get a significant p-value with a low sample size, it just means that you had enough power to detect the difference. -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 35.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 10:19
    The excellent discussion so far has focused on theoretical considerations, but there is a practical consideration too.  A good statistician doesn't simply report a p-value (or, better, a confidence interval, if possible), he or she must also confirm that the assumptions supporting it are valid.  For an exact test the assumptions are minimal but not nonexistent.  For a t-test with small N the data must be be normal.  If N is too small, one may not have enough data to do diagnostics well enough.  In this sense the conclusion is not reliable, even though there is no theoretical issue with the test itself operating on small N.

    Of course "N sufficient to check assumptions" is a mathematically fuzzy statement--and should probably remain so.

    -Jim

    -------------------------------------------
    James Garrett
    Manager, R&D Statistics
    Becton Dickinson
    -------------------------------------------








  • 36.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 10:53


    "Of course "N sufficient to check assumptions" is a mathematically fuzzy statement--and should probably remain so-"

    I agree with James. As an example, if the data is quite skewed, particle data in the semiconductor business, e.g., than even the standard "n >= 30" is insufficient for the central limit theorem to make the distribution of averages normal.
    ------------------------------------------
    Patrick Spagon
    -------------------------------------------








  • 37.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 13:24

     

    I'd like to add a new wrinkle to this discussion of p-values.  It is interesting that a barely significant or barely nonsignificant p-value is taken as a credible indication of statistical significance or nonsignificance.  Please examine these simple results and see what I mean.

     

    I had a set of real data comparing the length of spinal fusion in two well-defined groups. Group 1 had n=20 and Group 2 had n=23, so the sample size seemed to be large enough that any conclusion would seem plausible.  The t-test p = 0.0446.  I got curious as to what would have happened if one patient had not shown up for the study. Using round-robin deletion, the resulting p-values ranged from 0.032 to 0.082, with 72% if p-values > 0.05.  So, if one patient was lost to follow-up, 72% of the possible outcomes would have suggested no significant difference, or an inability to say which group had the higher mean length.

    FYI: If any two were lost to follow-up, the p-values would have ranged from 0.005 to 0.16. 

     

    To see if a non-parametric method (Kruskal-Wallis) would not have this problem, the complete data set had p=0.036; excluding one record, the p-values ranged from 0.019 to 0.055; excluding two records, the p-values ranged from 0.006 to 0.085.  While the K-W test had less of a problem, a borderline significant p-value is still sensitive to who is and is not included in the analysis.

     

    So, we see that conclusions based on marginally significant or non-significant p-values might simply depend on whether one or two patients were not lost to follow-up; I call this "fragile findings of statistical significance."  This is rarely mentioned in textbooks, where the computed p-value is the final story.

     

    If we go back to Fisher, he thought that a single significant result was not the end of the story.  He believed that repetitions of the study with the same conclusion would provide the basis for having faith in the reported finding. The Neyman-Pearson notion of Type I and Type II error rates, with a = 0.05, have encouraged the use of one-shot studies (plus the time and cost of doing so).

    Regarding sample sizes and p-values, I published an article (Am Stat 2010;64(1):30-33.) that shows how the interpretation of p-values using P(X<Y) is wholly dependent on sample size.

    -------------------------------------------
    Richard Browne
    Texas Scottish Rite Hospital for Children
    -------------------------------------------











  • 38.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 14:18
    I have been reading the discussions thus far and it has been very interesting. I would like to share two comments with all.

    The title of this threat was asking: does sample size affect the interpretation of p-values. Many posts talk about estimated p-values where asymptotic distributions are used while calculating the p-values. From my impression of reading the previous posts, the conclusions seem to be: yes, the interpretation of p-values do depend on sample size because when the sample size is too small, the asymptotic distribution assumptions don't hold, so the resulting p-values are not reliable. However, this situation is not just the question of relationship between sample size and p-values, it is more about the question whether the asymptotic results are reliable, including the point estimate, confidence intervals and p-values. So if the sample size is too small to hold the asymptotic distributions, all the results (including p-values) obtained from that distributions are not reliable.

    From there, I thought about the interpretations of p-values under exact distribution. Personally, I think the interpretation of P-values should be in context of the sample sizes, no matter whether it is from the asymptotic distribution or exact distribution. Consider the following simple examples: X follows binomial (n,p). H0: p=0.5 vs. H1: p>0.5. Now we have the observed following results:

    1) n=4, x=4, p-value=0.06, p_hat=1
    2) n=5, x=5, p-value=0.03, p_hat=1
    3) n=20, x=15, p-value=0.02, p-hat=0.75

    Imagine we need to interpret the results to some non-statisticians, how should we interpret the p-values in these three situations? If only follow p-values, 2) and 3) are both significant, while 1) is non-significant. However, the estimated differences for 1) and 2) are both 0.5 (1-0.5), but different sample sizes changed the direction of the result from non-significant to significant. Also, 3) only have a difference of 0.25 (0.75-0.5), which situation is more significant?

    -------------------------------------------
    [Caiyan] [Li]

    -------------------------------------------








  • 39.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 14:25
    It sounds like your study had a power of around 51%.  So half the time the study would be statistically significant.  If you drop a patient, as your excellent and thorough supportive analysis illustrated, you frequently lack the power to be statistically significant.  If you dropped more than one, the study is less likely to be significant.  etc.  What did we learn?  Studies with 51% power are barely significant.  When you drop patients, the p-value tends to non-significance.  Your K-W analysis demonstrated that the t-test assumptions were not the cause of the results.

    What conclusions with regard to the length of the spinal fusion?  The lower end of the CI, for the full sample indicated that the difference in spinal fusion length was not zero.  I hope the mean improvement was always in the correct direction and meaningful, even when you did the drop of each single patient.  The lower CI of the difference often included zero, when you dropped the patient.  The upper end of the CI might indicate that the size of the difference is potentially of clinical importance.

    In sum, I would suggest the client might feel comfortable with the study results - it was not zero, but future studies should be planned with at least 80% power.  You didn't mention your observed mean difference and standard deviation.  Shouldn't they be the focus of your report???

    -------------------------------------------
    Allen Fleishman
    Allen Fleishman Biostatistics Inc.
    -------------------------------------------








  • 40.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 11:17
    There's no question that a p-value of 0.0446 is a borderline result, but bringing in a post hoc power calculation to explain this sets a very bad precedent.

    The abuse of power: the pervasive fallacy of power calculations for data analysis. John M Hoenig, Dennis M Heisey. The American Statistician 2001: 55(1); 19-24.

    By the way, the correct answer to someone who claims that a p-value based on small sample sizes is not "trustworthy" is that they are wrong. A p-value is valid at any sample size because it preserves the Type I error rate. It may not do a lot of other things well, but it wasn't designed to do those things well. Asymptotic considerations are a red herring because we all use nonparametric tests and randomization tests when the sample size is small, at least as a sensitivity check.
    -------------------------------------------
    Stephen Simon
    Independent Statistical Consultant
    P. Mean Consulting
    -------------------------------------------








  • 41.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 11:47
    I am basically in agreement with Stephen's summary.  The p-value should be used as a measure of significance regardless of sample size, as long as the assumptions of the statistical test are met and the test is correctly applied.  The fact that relatively small effect sizes can be significant if the sample size is large is to be expected, since the level of precision is greater when based on a large sample, and the opposite is true with small sample sizes. But that does not in any way invalidate the p-value as long as the effect size is understood.  Also, very small sample sizes generally violate the assumptions of parametric tests, which again does not invalidate the p-value but does invalidate the incorrect application of statistical tests.
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 42.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 12:25

    I agree that post hoc power has little true meaning.  But I am somewhat converted on the idea that p-values are a concern in small samples.  I don't think you should say "we all use nonparametric tests and randomization tests when the sample size is small."  While asymptotics is a problem in small samples I am not sure how a nonparametric test becomes a sensitivity check.  Suppose a two sample t test gives a significant p-value but the corresponding Wilcoxon rank sum test does not.  Does that say that the t test is wrong?  Not necessarily.  It could be that the parametric normal model is correct and the nonparametric test just lacks the power to detect the difference.  It creates a dilemma because the sample size is not large enough to do a test for normality.

    I don't see the asymptotics as being a red herring.  But I think the lack of robustness issue is more compelling.
    Suppose we test a coin to see if it is fair and toss it 5 times.  We get 4 heads and one tail but misrecord the tail as a head.  So we calculate the one-sided p-value for 5 heads and get 0.03125.  Had we gotten the correct result of four heads and one tail the p-value would be 0.03125 + 0.15625= 0.18750.  An error in one result makes the difference between a significant p-value and a very nonsignificant one. On the other hand if we had 100 tosses with 99 heads and 1 tail compared to an erroneous count of 100 heads the difference in p-values would be negligible and we would conclude that the coin is biased in favor of heads in either case.

    So whereas before the discussion I would have agreed with Stephen and Edith, now I do not.

    As an aside for small or large sample sizes I would do both the F test and the Wilcoxon because it is so easy to do using the SAS NPAR1WAY procedure. In large samples I would probably also be a Wilk-Shapiro test for normality.  But I would not presume that every statistician would do exactly what I do.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 43.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 14:00

    Michael, your example only demonstrates that the p-value based on a small sample is proportionately more distorted by a given number of data errors than a large sample would be.  That is not saying that the p-value is distorted because it is based on a small sample, only that a single erroneous measure will distort a small sample proportionately more than it would a large sample.  In fact, the distortion due to data errors be would more correctly expressed as a percentage rather than a number, i.e. the effect by a 10% error rate in a sample of 10 (=1/10 wrong measures) compared against a 10% error rate in a larger sample (=10/100 wrong measures).  This, again, is not showing that p-values are invalid when used in small samples, only that they are proportionately more likely to be affected by data errors, the wrong use of statistical tests, incorrect interpretation of the results with regard to effect size, etc.; and perhaps most importantly, they are less likely to detect an existing effect size that is not very large. Also, there is a continuum between a sample of 1 vs. a sample of >10,0000, so where should statisticians draw the line for the p-value being unreliable?  
    -------------------------------------------
    Edith Zang
    Independent Consultant
    NYCASA
    -------------------------------------------








  • 44.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 18:09
    That was an interesting article in the American Statistician by Hoenig and Heisey, thank you for pointing it out.  However the authors' main thrust was on using a post-hoc power analysis to explain non-significant results, not statistically significant results.  From their summary, they said, "There is also a large literature advocating that power calculations be made whenever one performs a statistical test of a hypothesis and one obtains a statistically nonsignicant result. ... This approach, which appears in various forms, is fundamentally flawed." 

    I observed that the statistically significant result demonstrated that the results excluded zero, but when the results are near 0.05 removing an eliminating a single observation can and will change the p-value.  My power analysis stated that a properly designed larger study with 80% power (twice the size of the current one) would be far, far less sensitive to such 'fragility'.  Power analyses should also be presented for planning future studies.  They need to know that future studies, if they want consistently positive results, might need the larger sample size.  The discussion on 'fragility' should emphasize that although the results were 'significant', it could easily have been non-significant, due to the inadequate N.  They basically 'lucked out', this time!!!  The next time, with N=20, they might not be so lucky.

    It is also encouraging that Hoenig and Heisey, in their discussion, suggested that hypothesis testing be de-emphasized and greater use of parameter estimation and confidence intervals be made.  "Introductory statistics classes can focus on characterizing which parameter values are supported by the data by emphasizing confidence intervals more and placing less emphasis on hypothesis testing.   ...  With traditional frequentist statistics, this is best achieved with confidence intervals, appropriate choices of null hypotheses, and equivalence testing. Confusion about these issues could be reduced if introductory statistics classes for researchers placed more emphasis on these concepts and less emphasis on hypothesis testing."

    That was also the thrust of my entire discussion, less emphasis of p<0.05 and more emphasis on computing point estimates and CI.   It is especially important to emphasize the magnitude of the effects when the study is exploratory (i.e., small).  

    With regard to the lengthy discussion on model sensitivity and assumptions, let me quote George Box, "All models are incorrect.  Some are useful."  The magnitude of the possible errors due to non-normality or heteroscedasticity would be minimal when N/group~21 and equal Ns are used, respectively.  In such a study, our clients would strongly benefit, not on lengthy discussions of model assumptions or Monte Carlo p-value sampling re-estimations, but from the point estimates and CI.

    -------------------------------------------
    Allen Fleishman
    Allen Fleishman Biostatistics Inc.
    -------------------------------------------








  • 45.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-05-2012 09:42

    Having gotten to this late, let me add my two cents to this interesting exchange.

    The interpretation of a P-value is the same, regardless of sample size.  In its ideal definition for simple hypotheses, it is the probability of seeing evidence at least as extreme as that actually observed when the null hypothesis is true.  For composite hypotheses it is the same if the nuscance parameters can be eliminated, or it is the maximum probability for situations where nusance parameters remain.  So this definition is constant, regardless of the setting, provided, of course the methods used are correct, and the hypothesis is not a product of selective reporting.  

    Here is an example:  A trial; with binary outcomes randomizes 10 patients to each treatment, and the outcomes are 6/10 vs. 1/10 successes.  The observed Z (pooled variance) is 2.344, and the "exact unconditional P-value" is 0.023.  This is MaxP['Z'>=2.344 0<P1=P2<1].  See Suissa-Shuster JRSS (1985).  This assumes this method was propsectively selected before the trial bregan.

    Had the planners elected the much less powerful Fisher's "Exact" CONDITIONAL test, their P-value would have been 0.057. 

    In the first case, if the target population proportions are truly equal, no matter what they are, there is only a 2.3% chance that the pooled Z would exceed 2.344 in absolute value.  However, the conditional P-value interpreatation also has the nasty caveat that if you had replicated the study in a population where the null is true, and if you observed a total of 7 successes and 13 failures, you have a 5.7% chance of observing a
    discrepancy of at least 5 in the number of successes in either direction.

    So the P-value, for what it is worth, when used correctly, has the same interpretation regardless of the design. But it is only one piece of the puzzle, and the points made about power, effect size and interval estimation are more critical to the process than the P-value itself. 

    -------------------------------------------
    Jon Shuster
    University of Florida
    -------------------------------------------








  • 46.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-05-2012 10:48

    The point is not exactly interpretation but rather trustworthyness.  An article related to this issue was brought to my attention  In the book "Pharmaceutical Statistics Using SAS: A Practical Guide"  O'Brien and Castelloe have a chapter titled "Sample-Size Analysis for Traditional Hypothesis Testing: Concepts and Issues."  In it they discuss what they call crucial Type I and Type II error rates. These were originally defined by Lee and Zelen (2000) to be

    a* = Prob{H0 true ' p=a) = a(1-?)/[ a(1-?) + (1-ß) ?] where p= the p-value a = type I error ß = power at the specified alternative and ? = apriori probability that the null hypothesis is false.

     

    Also ß* = Prob(H0 false ' p>a) = ß?/[ ß? +(1 - a)(1 - ?).


    The quantities a* and ß*  are gotten by simply applying Bayes rule.  O'Brien and Castelloe provide tables for these "crucial type I and II errors based on assumed ? and specified a and ß.  They point out that looking at a* and ß* when performing sample size calculations might affect your decision about appropriate sample size.  To the point of our discussion when sample size is small and power is low a* can be high even though the p-value is less than a.  As an example Lee and Zelen contend that in typical phase III clinical trials ? is about 0.3.
    For ? = 0.3 , ß = 0.7 and a = 0.05 a* = 0.28.

    This speaks directly to the trustworthiness of the p-value when the sample size is small and is the best argument I have seen yet regarding the importance of sample size with respect to p-values.

    References:
    Lee, S. J. and Zelen, M. (2000). Clinical trials and sample size considerations: Another perspective. Statistical Science 15, 95-100.
    O'Brien, R. G. and Castelloe, J. (2007).  Chapter 10: Sample Size Analysis for Traditional Hypothesis Testing: Concepts and Issues, pp. 237-271 in Pharmaceutical Statistics Using SAS: A Practical Guide.  The SAS Institute, Cary NC.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 47.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-05-2012 12:31

    These alternative types of errors have different interpretations than the frequentist p-value.  I do think that such posterior probabilities are very important for considering planning studies, and drawing conclusions.  However, I don't see how this discussion goes to how trustworthy a single p-value is depending on the sample size.  We have to remember that p-values simply tell you how likely the data are assuming that H0 is true.  The p-value doesn't tell you how likely you are to make a type 1 error if you reject the null.  As pointed out, the posterior prob that the HA is true given you're data depends on the prior prob of HA.  This prior prob depends on many things such as the quality of the science leading to the current study as well as the quality/effectiveness of the scientific team. 
    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------








  • 48.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 15:11
    As a quick f/u of my last post.  I did a small calculation of power, using N = x^2/d^2.  Where x is (z-alpha + z-beta) and d is the effect size.  When results are barely significant and alpha is 0.05, z-alpha would be 1.96 and z-beta (with 50% power) is 0.  So with an average N=21, we would observe that the d, or the effect size of the difference, is around 0.43 standard deviations apart (a low to moderate effect size).  If you had planned the trial with 80% power, the trial N should have been 43 (replacing x^2 with 10.508).  So your client's study was half the size it should have been.  This is the real reason for your 'fragile findings of statistical significance', scientists who didn't plan their trial correctly!  With an adequately planned trial, removing one or two observations shouldn't change the overall 'statistically significant' conclusion

    -------------------------------------------
    Allen Fleishman
    Allen Fleishman Biostatistics Inc.
    -------------------------------------------








  • 49.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 17:29
    Since I started this thread and it has received a great deal of interest, I would like to give you more background and summarize what I got out of this.

    The conference I attended was a special conference in DC on Large Scale Inference honoring Brad Efron.  The comment on p-values was made by Carl Morris.  There seemed to be agreement but not much discussion or explanation about it. Later on I wondered about it and raised the issue with some of the participants via email but only got a couple of responses. One from Efron but nothing from Morris.

    So I decided to raise it here figuring some of you would have some arguments supporting the statement.  Before this it was customary for me to tell clients that got significant results with small (underpowered) studies that statistical significance (low p-value) and randomization was enough to make the result publishable and that they were lucky to have a large effect size.  But after hearing Carl I figured there had to be some good reason behind his concern about small samples.  I think I got the answer from this discussion.  For asymptotic null distributions, small sample size can mean poor approximation to the null distribution and consequently, poor approximate p-values.  Even for exact tests, the p-value may not be robust in the sense that small changes in the data set can lead to a large change in the p-value when the sample size is small. So I guess for exact tests when the data is known to be accurate the interpretation of a signifcant p-value is safe but in other instances it may not be and care should be taken.

    There are many other issues with p-values that concern statisticians and several of these were brought out, (1) strict interpretation of 0.05 for significance, (2) need to adjust p-values for multiple testing (a particularly important issue with large scale inference where the p model parameters can be large relative to the sample size n) and (3) assumptions that affect the null distribution of the test statistics.  As someone pointed out when assumptions can be verified in the sense that say normality can be checked via goodness of fit, correlations can be checked as a partial check for independence, and variance estimates can be checked to see whether two samples have the variance.  But these assumptions are not completely checked and some assumptions can not be checked (such as how good the distribution of the the test statistic is compared to its theoretical asymptotic distribution.

    I want to thank everyone for their insights.  Although some disagreements occurred, I think these were just differences in semantics rather than anything substantive.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 50.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 18:24
      |   view attached

    I just thought I would add one last thing to share.  I have been working on genomic data with small sample sizes, so the discussion was very relevant to what I am currently doing.  I have attached the p-value distribution for the same data using 6 different potential tests.  This experiment involved 2 groups with 4 "cases" and 7 "controls."  Not every test had data available for all 11 samples, with the sample sizes being at least 2 + 2.  The data were proportions for each sample for a large number of genetic sites (~ 2 million = # of tests).  These plots show p-value distributions that are of a big concern to me.  At least one of the tests had a reasonable distribution.

    -------------------------------------------
    Robert Podolsky
    Georgia Health Sciences University
    -------------------------------------------






    Attachment(s)

    pdf
    pval_dist1.pdf   6 KB 1 version


  • 51.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-03-2012 10:59
    I would like to apologize for any confusion caused by my last post.  The alternative hypothesis - greater than, less than, or not equal - determines which tails are included in finding the p-value.  Also, because of the central limit theroem, for a large enough sample size, the p-value should be a reasonably accurate measure of the true probability, independent of the underlying distribution.  For small samples, one must assume a distribution to obtain a p-value and the distribution can be wrong, which would make the p-value wrong.  Maybe that is why the professor cautioned about p-values in small samples.  Here, I am assuming that the data selection is such that the assuptions behind the distribution of the statistic are correct (e.g., independent variables, random selection with or without replacement.)


    Margot


    -------------------------------------------
    Margot Tollefson
    Owner
    Vanward Statistical Consulting
    -------------------------------------------








  • 52.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-04-2012 09:49
    I would like to re-address the issue of non-normality in small samples.  The following is from a statistics text by William Hays.



    The first plot presents the original negatively skewed distribution (black line) and the normal distribution (dotted line).  The second plot gives the sampling distribution of means from the original distribution when N=2, already more normal than the original distribution.  The third and fourth plots present the sampling distribution when N=4 and 10, respectively.  As can be seen, when N=10 the sampling distribution is very, very close to a normal distribution. 

    I reiterate, when N=20, non-normality should not be a concern.

    If the data is non-normal, as Patrick Spagon suggested for 'quite skewed, particle data in the semiconductor business', then we should be aware of this issue and take preventive action.  For example, using logs or square root transformations.  Yes, in small samples we cannot prove (i.e., p < 0.05) non-normality or other assumptions, but our (and client scientist's) knowledge of the field should give us warning to apply normalizing transformations suitable for our parameters. 

    What bothers me more, is the action of some of my colleagues who have very large data sets (e.g., N=500), see a slight, but statistically significant, difference from normality and reject parametric testing.  Lunacy, sheer lunacy.

    In conclusion, when N is small (but at least 10), the central limit theorem will still bail us out.  When N is large, the central limit theorem will always allow allow for non-normality.  Our experience in the domain should allow us to further reduce the impact of non-normality, especially after a quick plot of the data.

    I still feel that only using non-parametric hypothesis testing would preclude our ability to come up with useful parameter estimates and confidence intervals.  That, I feel, is our real goal - what is the treatment effect.  Over reliance on p-values only tells us if the effect can be zero.  This is like having an encyclopedia article on elephants which only says 'Elephants are animals.'  I feel that non-parametric testing is only useful as a supportive analysis to confirm the basic findings of our point estimates and CIs.

    -------------------------------------------
    Allen Fleishman
    Allen Fleishman Biostatistics Inc.
    -------------------------------------------








  • 53.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-05-2012 10:29

    Michael and I independently present the same point. 

    Jon
    -------------------------------------------
    Jon Shuster
    University of Florida
    -------------------------------------------








  • 54.  RE:Does sample size really affect interpretation of p-values?

    Posted 01-05-2012 10:57

    Yes but if you follow the discussion you will see that I have changed my opinion on concerns about p-values in small samples.
    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------