In my stats classes, I understand that most of my students, probably all of them, are never going to take another stats class after this. So, I try to focus on making them aware of what is possible and discussing how they can be fooled be "pretty pictures".
For example, I gave them student level data on course#, semester, prof id and student grade. I had them do some data prep on 2-3 of the 20+ classes I had data on. They turned semester into Term and Year. Then run a regression model on that data. I asked them to then optimize the response. The result was the highest GPA for the classes they looked at.
Do they understand the underlying mechanisms of how the regression worked or how the response was optimized? No.
Do they understand you can take a lot of data, prep it, and analyze more than one thing at a time? Yes.
Do they know you can make a model and use it make things better or help make important decisions? Yes.
On the exam my students are taking right now, (yes, I am watching them suffer at this very minute.) I gave them an ANOVA table and a Table of Coefficients from a regression model on life expectancy for several countries. The data goes from 1950 to 2015. I asked them to use the regression model to predict life expectancy of Australians in 2007 and 2177. They get good results for the 2007 data. (it makes sense.) The model predicts life expectancy to be over 700 years in 2177. (That doesn't make sense.) But, they do "remember" discussing extrapolations of models beyond the data we have and how forecasts tend to be wrong, and how a good model is good over a select bit of data, how we should let the data tell us what to do, not tell the data how it will behave, etc.
Do my students understand that models have limits on their usefulness? Yes.
Do they question the validity of results? Yes.
Do they trust "projections" and "forecasts" far into the future? No.
We discussed lurking variables on many occasions. I used my time doing student level data analysis as an example. I can make box plots of student GPA in certain classes, and break that down by race and gender. I tell them, "If we were silly or dumb, we would look at this data, we would see that white students have higher GPAs than black, hispanic and other groups. What is the first thought that pops into your mind when you see that?" After a very awkward pause, someone will give an answer like, "It suggests that white people are smarter." That begs the question, "Does skin color really affect IQ?" (No one believes that.) Then I ask them to give other reasons why the GPAs are different. When I make boxplots of GPA by race and "Pell Eligibility" the differences between races almost totally disappears. The next question is, "Did your thoughts change from the first set of Boxplots to the second set?" (Pell Eligibility is a measure of poverty in the family and a great predictor of student success/failure.)
Do my students understand the social, political, economic underpinnings of poverty? No.
Do my students know that poverty is a driving force in low student GPAs and low success rates at college/university? Yes.
Do they know not to be fooled by pretty pictures that "tell the story"? Yes.
Do they know to ask better questions and not make assumptions based on how data is presented? Yes.
If you are looking for data literacy for the masses, that would be a good place to start. Making your employees aware of what is possible. To not let pretty pictures fool them. Ask deeper questions. Look for reasons why the data is the way the data is and not accept that, " a smart person" said it is so, therefore it must be. Most importantly, to know their limitations and ask for help with bigger issues.
If you want all the employees to be sophisticated statisticians, that would be a good first step.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
------------------------------
Original Message:
Sent: 04-12-2019 07:07
From: Fred Hulting
Subject: Data Literacy Course/Curriculum
There is a desire in my company to establish "data literacy" training (Gartner continues to pose this as a big challenge for companies as they go "digital").
Within our R&D organization we have an established statistics curriculum which includes some of these ideas. However, I would like to update that content to reflect current thinking on data and statistical literacy. My question: what does a modern course or curriculum in "data literacy" look like? Does anyone have references they would recommend, or specific topics/ideas/approaches to suggest?
Thanks in advance for your input.
Fred
------------------------------
Fred Hulting
Director, Global Knowledge Services
General Mills, Inc.
------------------------------