At a couple different talks, I tried a few different methods to "prove" you can change more than one thing at a time. The first was, I took a poll. I asked if you can change more than one thing at a time during an experiment. 100% said no. I asked why they felt that way. They said you can't tell what caused the outcome. I bet them I could make them change their answer.
Admittedly, I don't know what is the best way to do this. But, I know these don't work.
1) I discussed an experiment to make NaCl from NaOH and HCl. (Table salt from Sodium Hydroxide and Hydrochloric Acid.) The concentration of NaCl made, [NaCl] = min([NaOH],[HCl]). By using both, will you get any table salt. To increase the yield, you must increase both at the same time.
Scientist A performed the experiment with the factors and levels:
Run [HCl] [NaOH] [NaCl]
1 0.0 0.0 0.0
2 0.0 0.5 0.0
3 0.0 1.0 0.0
4 0.5 0.0 0.0
5 1.0 0.0 0.0
This scientists concludes you can't make table salt from HCl and NaOH.
Scientist B performed the experiment with the factors and levels:
Run [HCl] [NaOH] [NaCl]
1 0.5 0.0 0.0
2 0.5 0.5 0.5
3 0.5 1.0 0.5
4 0.0 0.5 0.0
5 1.0 0.5 0.5
This scientist shows you CAN make NaCl.... But, yo ucan't make more than 0.5 mole of it.
Scientist C performed the experiment with the factors and levels:
Run [HCl] [NaOH] [NaCl]
1 0.0 0.0 0.0
2 0.0 1.0 0.0
3 1.0 0.0 0.0
4 1.0 1.0 1.0
5 0.5 0.5 0.5
This scientist proved you can make as much NaCl as you want. You just need to increase both proportionally.
Someone stood up and said, "This is a stupid example. Everyone knows you need to change both at the same time!" (I won the bet! and lost teh audience.)
I mentioned that, "Scientists A and B used OFAT methods and got the wrong answer. When I asked if you thought you should change more than one thing at a time, you all said no. And you all know that is wrong."
For another talk, I discussed a system of linear equations. Each coefficient was different. Yet, you can still figure out the values of X, Y and Z. This didn't work either. So, I wasn't even going to try to explain how to solve overdetermined systems of linear equations.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
------------------------------
Original Message:
Sent: 03-06-2023 15:01
From: Lloyd Provost
Subject: Reproducibility in statistical analyses
Andrew,
In response to your question: How do you "prove" to scientists, that have been doing research for decades, that their OFAT methods are bad AND that you can and should change more than one thing at a time?
My experience has been that when they see/experience an important interaction of multiple factors that gives insight (or even solves) a problem they are interested in, they become open to learn about mult-factor designs. I have found this situation much worse with health scientists (clinicians and researchers) than in other industries. I put together a research team during Covid to try and understand why and found some interesting history that made this happen.
We wrote about this in attached paper: It Is Time to Reconsider Factorial Designs: How Bradford Hill and R. A. Fisher Shaped the Standard of Clinical Evidence, Quality Management in Healthcare in Vol. 29, No. 2, April/June 2020.
------------------------------
Lloyd Provost
Associates in Process Improvement
Original Message:
Sent: 03-05-2023 16:43
From: Andrew Ekstrom
Subject: Reproducibility in statistical analyses
I am hosting a a symposium for the American Chemical Society this summer. I've been looking for a topic to discuss during my "Data Science in Chemical Research" talk. I haven't been able to come up with anything new this year.... Until about 20 mins ago.
Do you mind if I borrow your idea about the analysis of "designed experiments" vs "observational" data analysis?
I think a title like, "Is scientific reproducibility a fallacy, a misunderstanding or sign of poorly designed experiments?" who knows, I might actually get more than 5 people watching....
I teach my students about the reproducibility of results vs conclusions. Adding in my data analysis as well as your idea for generating data with a macro, I should be able to show the power of designed experiments.... which brings up the another challenge. How do you "prove" to scientists, that have been doing research for decades, that their OFAT methods are bad AND that you can and should change more than one thing at a time?
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
Original Message:
Sent: 02-15-2023 07:28
From: Paul Mathews
Subject: Reproducibility in statistical analyses
Hi Andrew,
I do a series of similar exercises with my Design of Experiments students but my exercises are much more basic than yours. The data for these exercises are generated with macros so the underlying model is the same but there is superimposed random noise. When we use standard experiment designs everyone gets similar answers. Later in class, to emphasize the value of using a designed experiment, I ask them to analyze data sets with more complicated structure and of course their results diverge. We talk about some of the special methods available to analyze such data but those are beyond the scope of the class. If I am successful, the students learn to prefer designed experiments and distrust models built from happenstance data. I am lucky that most of my students come from the engineering and science community where they have some control over how their data are collected and they can choose to used designed experiments.
Long ago Angela Dean from OSU or Susan Collins from Lubrizol (I think they were working together) told us about a data set from a supersaturated experiment design that was published with a request for people to analyze the data and submit their results. A prize was offered for the best analysis. The purpose of the exercise was to assess analysis reproducibility. My recollection was that a small subset of the analyses submitted were on target but most missed by a wide margin - similar results to yours.
Kaggle's competitions provide a similar opportunity to assess analysis reproducibility. I haven't invested any time there so I don't know what kinds of problems have analyses that are shared. Maybe someone has already written something up? If would be fun/malicious to assign Kaggle projects with shared analyses for students to do their own analysis reproducibility evaluation.
I'm going to share this thread with my students in the future when we reach this topic.
Paul Mathews
Mathews Malnar and Baily, Inc.
------------------------------
Paul Mathews
President
Mathews Malnar & Bailey, Inc.
Original Message:
Sent: 02-14-2023 15:35
From: Andrew Ekstrom
Subject: Reproducibility in statistical analyses
Thank you all for your comments. Please keep them coming. I am thankful for all of you who have and will participate.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
Original Message:
Sent: 02-10-2023 10:39
From: Andrew Ekstrom
Subject: Reproducibility in statistical analyses
Hello Everyone,
A few months ago I gave some assignments to my Data Science students. I gave them a data set. I told them to partition the data into a testing and training set. Then run various analyses on the data. Over the course of the class, they made Logistic Regression, CART and Random Forest models, among others. The data itself had 20-30 variables to pick from and about 10,000 rows. These were In Class assignments and activities. I had students write the results of their analyses on the board so we could all see.
The end result was, most students didn't have the same answers as far as what variables in the data were "important". I had each student use a different random seed. Unlike a lot of the standard data sets out there for use, I made up these data points. So, I know, for a fact, what variables were used to create the response variables. I know how much randomness I added to each data set.
No one ever got all the right variables from their analyses. However, by using all of there results, we were able to find 60% to 80% of the variables I choose for the models. We were also able to eliminate 90%+ of the spurious correlations. We were even able to "roughly" categorize the variables into "Most important", " Very Important", "Most likely random" and "Not important" based upon how often certain variables came up in models and those groupings most correlated to the size of the coefficients I used to create the models.
For example, If X1, X3, X4 come up 100% - 90% of the time, X9, X13, X17 come up 90% to 70% of the time, and variables X7, X11, X12 and X28 never come up, we can classify these variables as "Most important", "Very Important" and "Not Important".
I've been looking through the research a bit on reproducibility in statistical analyses. There is a lot of talk about making sure software is able to accurately reproduce results. But, I am not aware of any articles that discuss or "prove" that The nature of data itself might not lend itself to good reproducibility... at least not under certain conditions.
Is anyone aware of studies that show data analyses techniques are not that reproducible?
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
------------------------------