ASA Connect

 View Only
  • 1.  Why are so many "Intro to Stats" books so out of order?

    Posted 01-03-2023 23:33
    For the first time in a few years, I have the privilege to use a statistics book of my choosing. My students are "intro to stats" students. When I look at how almost every "intro" book is set up, it seems like they all plagiarize each others order and never think about why things are done in the order they are done. 

    For example, in a typical Chapter 2 or 3, there will be a section on correlation and linear regression. Then, later in the book, there will be a chapter on correlation and linear regression. In the early chapter, we treat coefficients and the model as deterministic. Later on, we learn the coefficients are stochastic and we have a discussion about "Statistical Significance". Why cant we teach it once, and teach it the right way? Flopping back and forth is confusing and having to unlearn things from an earlier chapter means we wasted time earlier in the book. That time could have been spent discussing multiple linear regression, which helps prove that the non-sense many scientists believe about experiments and data analysis are wrong. (How many scientists believe you can't change more than one thing at a time during an experiment because either "statistics doesn't allow this" or "You can't tell what had the effect on the dependent variable"?) 

    The typical textbook teaches linear regression THEN ANOVA. Why? Shouldn't that be flipped?

    In a typical chapter 3 or 4, when we discuss "basics of probability" we teach the formula P(A or B) = P(A) + P(B) - P(A and B). We then discuss how to tell if P(A and B) are dependent or independent. Dependent vs independent comes down to if P(A and B) = P(A)*P(B), its independent. Otherwise, its dependent. Then, in a chapter 7, we discuss confidence intervals for a single proportion. Again, we teach, then unteach and reteach. We just wasted MORE time! Why?

    If we were good, we could discuss point estimates for mean, proportions and standard deviations and the confidence intervals for those values earlier. That would allow us to almost eliminate an entire chapter of most textbooks. 

    That chapter 7, which is usually on confidence intervals for the point estimates I already mentioned is followed by hypothesis tests. Those hypothesis tests are based upon P-values (Which the ASA had some opinions about). Those P-values are based upon Z, t or Chi Sqr values. Which are then used to create confidence intervals..... Once we have those confidence intervals, we can run tests to see if the results will stand up to future experimentation. Between critical values, p-values and confidence intervals, we have 3 ways to tell if something "significantly different". Without a calculator, critical values require one to memorize tables of data or look things up in an incomplete table. This makes them difficult for most people to interpret them. P-values are faulty. Neither lead themselves asking how reproducible the results are. Because of how most people think and react to data, confidence intervals allow you to quickly see if new data confirms your results or confirms your conclusion about your data. Its fairly easy to test the probability others will confirm your conclusions or results. But we default to critical values and p-values. Why? 

    In general, I try to discuss confidence intervals in chapter 2 to 4 range. If we are discussing continuous data, I start with 1 sample tests, then 2 sample tests, then ANOVA, then regression models. I show that a pooled 2-sample t-test gives the same results as ANOVA would. Then show that ANOVA tests can be done as regression models. (I even discuss how to use simple linear regression models on "paired t-test" data.)Then send a lot of time discussing how we can run linear regression models. For proportions, 2-prop tests lead to Chi Sqr to Logistic Regression models.

    In the case of Chi Sqr tests, we often see researchers categorize continuous values, just to fit them in a Chi Sqr table, when we all know using logistic regression and keeping the continuous data continuous is a FAR BETTER idea.  

    To me, showing that we can see the effect of many things on the outcome NEEDS TO BE a goal. Most scientists NEED to know this. 

    I have issues with about a dozen other topics taught in a typical "intro to stats" class. But, I'll save them for later.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------


  • 2.  RE: Why are so many "Intro to Stats" books so out of order?

    Posted 01-04-2023 07:59
    If and when you do choose a book for your course, please let us know your choice.

    Giles Warrack

    ------------------------------
    [Giles] [Warrack]
    [Retired]
    [North Carolina A&T State University]
    ------------------------------



  • 3.  RE: Why are so many "Intro to Stats" books so out of order?

    Posted 01-04-2023 09:55
    If I could, I would use a Business Stats textbook. The fear is that if I used such a textbook, the world would blow up.... or some other non-sense like that. 

    When I teach at community colleges, they fear using a good business stats textbook would make it hard for their classes to transfer elsewhere. Even though most of the students in those classes are taking my class to count as a Business Stats 1 class, or going into a program where that distinction doesn't matter. 

    For the classes I'm teaching right now, they are MPH and EHS stats classes. The books I can use are fairly limited based upon the assumptions of the entering students. Using a good business stats textbook would be HIGHLY frowned upon. 

    Thanks for the reply.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 4.  RE: Why are so many "Intro to Stats" books so out of order?

    Posted 01-04-2023 09:22

    There are two possible reasons why a student would want to take an intro to stat course:

    1) it is a required course     2) the student is genuinely interested in understanding statistical concepts

    Most students take the course because it's required.  Their primary interest is to pass the course.  It will be ages before they apply the material elsewhere, if at all.  They want to be told what to do and how (which buttons/icons to press on their laptop).  They have little interest in the why.  In general, only those who take statistics as a major are interested in the why.  Since the vast majority who take intro to stat are of the first variety, most intro textbooks are written for them.  If intro textbooks mostly follow the same pattern, it would seem that there is a general consensus that this is the most efficient approach.  After all, most of the students of the first variety will not remember any formulae two hours after the final exam, and the best we can hope for is that they retain the basic concepts.  Many intro textbooks are constructed so that if a student would chance to actually need the material, the book will come in handy.

    I would have qualms about trying to create a general formula from which many things follow as special cases, such as ANOVA/regression.  That may be great for the type 2) student who is mathematically oriented, but the type 1) student wouldn't care to know that these are cousins.  Furthermore, statistical concepts are initially foreign to most students, so taking it slow has its merits.  Repetition (regression I and regression II) is not necessarily a waste of time.  I would question whether squeezing things into a single frame in order to cover more in the course is an advantage.

    I do find it a pity that the soul of statistics – the intuition behind concepts – is often left out.  There is much more intuition behind defining independence as P(A|B)=P(A) than via P(A and B)=P(A)P(B) – the idea that independence means that the added information has no effect on the original probability is widely unknown, misunderstood or forgotten.   What's the intuition behind the average as a measure of center?  (The ancients didn't know the average – in fact, it's barely 400 years old!)  Is there any intuition besides computational convenience in describing variability by the standard deviation?  (Yes – Pythagoras!)

    I was enthralled by the Freedman, Pisani and Purves textbook "Statistics", but I have to admit that when I used it I scored low on students' evaluations.  C'est la vie.



    ------------------------------
    Moshe Pollak
    Professor
    ------------------------------



  • 5.  RE: Why are so many "Intro to Stats" books so out of order?

    Posted 01-04-2023 10:24
    When we teach a basic "Intro to Stats" class, I approach it knowing, full well, that most of my students WILL NEVER take another stats class. A lot of my colleagues feel otherwise.

    The questions I ask and discuss in my classes are mostly relevant to most students. They might not use everything everyday. But, they know how they can use what I teach in a lot of areas that they never knew about. I also tend to have 90% plus of my students show up every class period. Even so, my faculty evals tend to be high, except when it comes to textbook use. I follow what I think is the best way to go. They want me to do things in the textbook order. 

    Because I have such a diverse groups of students, at least most of the time, I cover more material than is in the typical intro book too. My business students would cover Poisson, Log Normal and Exponential distributions in their B-stats 1 classes. Most of the intro to stats books make these either "optional" or skip them. 

    I honestly think most textbooks tend to be in the same order, not because it is the best way to go. Rather, at places like the community colleges I teach at, we can have someone who took a stats class as a UG and now teaches in the math department and will teach a stats class. Where as a Business Stats class or Stats for Engineers class will be taught by someone that has taken multiple stats classes and has a better understanding of what to do and why you do it that way. Imagine if all you ever learned about in your single stats class ended around ANOVA and Simple Linear Regression. Now, they are asked to discuss multiple linear regression, logistic regression (both of which are routinely taught in a 4cr or even 3cr business stats class.) and the prof never took a class that used these topics. 


    thanks for the reply.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 6.  RE: Why are so many "Intro to Stats" books so out of order?

    Posted 01-04-2023 11:34
    Hi Andrew
      I generally don't participate in these discussions, but you might take a look at a book titled "Essential Statistics with Python and R" downloadable at:
    https://escholarship.org/uc/item/03w0n5g3
    This is free, and has been downloaded thousands of times by instructors and students.
              S. Rao Jammalamadaka (Distinguished Professor, Dept of Stats, UCSB)

    ------------------------------
    Sreenivasa Rao Jammalamadaka
    Distinguished Professor
    ------------------------------



  • 7.  RE: Why are so many "Intro to Stats" books so out of order?

    Posted 01-07-2023 19:17
    I actually have a student that would be interested in this for his own interests. Thanks.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------