Blog Viewer

SSPA Blog: The (Un)Importance of the Normality Assumption in Applied Statistics

  

"In theory, theory and practice are the same. In practice, they are not."

If you look up this quote, you'll find it has been attributed to many people, from Albert Einstein to Yogi Berra. Regardless of who might have said it first, the quote applies to statistics and the practice of statistics:

In theory, the theory and practice of statistics are the same. In practice, they are not.

That's why I'm so excited about the ASA's new winter Conference on Statistical Practice. The ASA is smart to reach out to practicing statisticians, statistical programmers, and data analysts, and to provide us with opportunities that go beyond the traditional JSM talks and workshops. I'm not sure what the conference will be like, but I hope there will be many opportunities to share best practices and to exchange information.

For example, I hope that participants have the opportunity to discuss assumptions: when they are needed and when they are not.

The kind of discussion I'd like to see is found in the article, "The Importance of the Normality Assumption in Large Public Health Data Sets" by T. Lumley, P. Diehr, S. Emerson, and L. Chen (Annu. Rev. Public Health, 2002). This article discusses in plain language the role of the normality assumption in linear regression and t tests.

The authors are frank, honest, and helpful in describing situations in which public health researchers can use linear regression and t tests, rather than more complicated statistical techniques. Among the take-aways, they mention that for these two techniques::

  • "It is rarely necessary to worry about non-Normality of outcome variables" when you are concerned about predictions (p. 164).
  • More important than the normality of the response is whether the variance of the response is constant (that is, not heteroscedastic) (p. 164).
  • "In small samples most statistical methods do require distributional assumptions (p. 152, emphasis mine).
  • In real data, "sufficiently large" is often less than 100 observations, and even for extremely non-Normal data, it is less than 500 (p. 166).
I hope that the Conference on Statistical Practice includes several papers and presentations like the one by Lumley, et al. Assumptions are necessary, both in theory and in practice. But in practice, it is easy to get confused about the theory.

#ResearchPapers
1 comment
30 views

Permalink

Tag

Comments

07-22-2011 10:42

I'm reminded of a T-shirt slogan seen at the University of Chicago: "Sure it works in practice, but what about in theory?"
In any case, I think recent activity by the ASA to reach out to applied statisticians in industry and in academia is a great thing.