For the first time in a few years, I have the privilege to use a statistics book of my choosing. My students are "intro to stats" students. When I look at how almost every "intro" book is set up, it seems like they all plagiarize each others order and never think about why things are done in the order they are done.
For example, in a typical Chapter 2 or 3, there will be a section on correlation and linear regression. Then, later in the book, there will be a chapter on correlation and linear regression. In the early chapter, we treat coefficients and the model as deterministic. Later on, we learn the coefficients are stochastic and we have a discussion about "Statistical Significance". Why cant we teach it once, and teach it the right way? Flopping back and forth is confusing and having to unlearn things from an earlier chapter means we wasted time earlier in the book. That time could have been spent discussing multiple linear regression, which helps prove that the non-sense many scientists believe about experiments and data analysis are wrong. (How many scientists believe you can't change more than one thing at a time during an experiment because either "statistics doesn't allow this" or "You can't tell what had the effect on the dependent variable"?)
The typical textbook teaches linear regression THEN ANOVA. Why? Shouldn't that be flipped?
In a typical chapter 3 or 4, when we discuss "basics of probability" we teach the formula P(A or B) = P(A) + P(B) - P(A and B). We then discuss how to tell if P(A and B) are dependent or independent. Dependent vs independent comes down to if P(A and B) = P(A)*P(B), its independent. Otherwise, its dependent. Then, in a chapter 7, we discuss confidence intervals for a single proportion. Again, we teach, then unteach and reteach. We just wasted MORE time! Why?
If we were good, we could discuss point estimates for mean, proportions and standard deviations and the confidence intervals for those values earlier. That would allow us to almost eliminate an entire chapter of most textbooks.
That chapter 7, which is usually on confidence intervals for the point estimates I already mentioned is followed by hypothesis tests. Those hypothesis tests are based upon P-values (Which the ASA had some opinions about). Those P-values are based upon Z, t or Chi Sqr values. Which are then used to create confidence intervals..... Once we have those confidence intervals, we can run tests to see if the results will stand up to future experimentation. Between critical values, p-values and confidence intervals, we have 3 ways to tell if something "significantly different". Without a calculator, critical values require one to memorize tables of data or look things up in an incomplete table. This makes them difficult for most people to interpret them. P-values are faulty. Neither lead themselves asking how reproducible the results are. Because of how most people think and react to data, confidence intervals allow you to quickly see if new data confirms your results or confirms your conclusion about your data. Its fairly easy to test the probability others will confirm your conclusions or results. But we default to critical values and p-values. Why?
In general, I try to discuss confidence intervals in chapter 2 to 4 range. If we are discussing continuous data, I start with 1 sample tests, then 2 sample tests, then ANOVA, then regression models. I show that a pooled 2-sample t-test gives the same results as ANOVA would. Then show that ANOVA tests can be done as regression models. (I even discuss how to use simple linear regression models on "paired t-test" data.)Then send a lot of time discussing how we can run linear regression models. For proportions, 2-prop tests lead to Chi Sqr to Logistic Regression models.
In the case of Chi Sqr tests, we often see researchers categorize continuous values, just to fit them in a Chi Sqr table, when we all know using logistic regression and keeping the continuous data continuous is a FAR BETTER idea.
To me, showing that we can see the effect of many things on the outcome NEEDS TO BE a goal. Most scientists NEED to know this.
I have issues with about a dozen other topics taught in a typical "intro to stats" class. But, I'll save them for later.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
------------------------------