ASA Connect

View Only

Back to discussions

Expand all | Collapse all

Improving student outcomes in math/Is reproducibility even a possibility presentations

1. Improving student outcomes in math/Is reproducibility even a possibility presentations

Recommend
Andrew Ekstrom
Posted 03-11-2024 08:27
In January, I gave a couple of talks that most of you should find interesting.

The first one is a comparison study of math students at Oakland University, Eastern Michigan University, Oakland Community College and Henry Ford College. The linked presentation is from a presentation I gave to Oakland Community College. I got student level data and was able to track student progress from their first math class to their last. Anyone interested in getting MORE STEM students in their programs, the info in it is really useful. I can even share all the data I used. The title of the talk is VERY appropriate: Mathemassacre

YouTube link: https://youtu.be/USFdGxNMjAU

The second talk is from the Detroit ASA section meeting. The title of this talk: Is Reproducibility Even a Possibility? This ended up being a comparison study of R vs Python, Logistic Regression vs Decision Trees, and what important or "Statistically significant" terms pop up in each type of model under several different levels of signal to noise. The big take away here is, (And I do this with my students when I can) Give all of your students the same data set. Have them partition the data using different random seeds, say the last 4 digits of their student id. Then have them run whatever algorithm you are discussing. Then, report the "important features" or "Statistically Significant" terms on the board. Then discuss why no one found the same results.

Having done this before with "real" data, I found that the software might determine that say 5-8 variables are important at any time I use a given random seed. But, if I start at the beginning, and move forward with the same analysis, but use a different random seed, I'll get another 5-8 variables. However, only 1-2 will be the same between the 2 analyses. Repeat the analysis with different random seeds, you begin to see each model as a mere opinion and a second, third, fourth, fifteenth opinion is needed. It also hints at how Random Forests should be an ensemble method where the random seed changes each time a new tree is made, vs a new random starting point with the same partition.

YouTube Link: https://youtu.be/sYPvCE_au4Q

------------------------------
Andrew Ekstrom

Statistician, Chemist, HPC Abuser;-)
------------------------------
2. RE: Improving student outcomes in math/Is reproducibility even a possibility presentations

Recommend
Michael Lavine
Posted 03-12-2024 09:00
Andrew Ekstrom writes, "... software might determine that say 5-8 variables are important at any time I use a given random seed. But, if I start at the beginning, and move forward with the same analysis, but use a different random seed, I'll get another 5-8 variables. However, only 1-2 will be the same between the 2 analyses. Repeat the analysis with different random seeds ..."

This idea is very close to an idea I called "near-optimization" that I promoted and tried to fund when I was a program manager for the Army Research Office. The general idea is that in many statistics problems there are many models that do a good job of describing the data and that we may be better served by finding lots of good models rather than seeking the single best model. To see why, suppose that all good models agree, say, that X₁ is an important regressor. Then we have more confidence in X₁than when some good models do not include X₁. I used to give a talk on this topic called "Suboptimal is Best, lots of it."

To my dismay, few researchers would submit grant proposals on near-optimization even though I announced I wanted to fund research on that topic. NSF take note: you should be funding this topic.

------------------------------
Michael Lavine
------------------------------

Original Message
3. RE: Improving student outcomes in math/Is reproducibility even a possibility presentations

Recommend
Andrew Ekstrom
Posted 03-12-2024 12:20
The first ASA presentation I gave was to the Ann Arbor chapter on optimization. As we learn in a operations research class, that chapter on sensitivity analysis tells us that we can change, at most, one coefficient at a time. Then, when we look at say a linear regression model, if Y = B₀ + B₁X₁ + B₂X₂ + B₃X₁² + B₄X₁X₂all those beta's are estimates, not fixed values. Something I found is that if say X₂ is dichotomous, under some sets of beta's that could be part of the model, (All values comes from normal distribution with mean B_n and Std Dev_n) the optimal value of X₂ WILL oscillate between each class.

In the presentation I gave, I discussed an experiment I did with my car to get the maximum MPG. The results of the model and the "optimal solution" suggested turning the AC up full blast was best. (In reality, AC set to 1 or opening all the windows really is the best.) I looked at the corner points of my design space, and found that if we stop the assumption of Y=10.00 >>>>> Y=9.99999999999999999999 and confidence intervals AND prediction intervals, do not exist, and allow the beta's to roam, some sets of corner points were optimal say 18% of the time, others 24% of the time, others 5%....

I know that is part of what is going on here. In this case, it has to do with the small changes to the coefficients of the model induced by having slightly different data points.

------------------------------
Andrew Ekstrom

Statistician, Chemist, HPC Abuser;-)
------------------------------

Original Message

ASA Connect

Improving student outcomes in math/Is reproducibility even a possibility presentations

Andrew Ekstrom03-11-2024 08:27

Michael Lavine03-12-2024 09:00

Andrew Ekstrom03-12-2024 12:20

1. Improving student outcomes in math/Is reproducibility even a possibility presentations

2. RE: Improving student outcomes in math/Is reproducibility even a possibility presentations

3. RE: Improving student outcomes in math/Is reproducibility even a possibility presentations

Contact Us

Membership

Privacy

Follow Us

ASA Connect

Improving student outcomes in math/Is reproducibility even a possibility presentations

Andrew Ekstrom03-11-2024 08:27

Michael Lavine03-12-2024 09:00

Andrew Ekstrom03-12-2024 12:20

1. Improving student outcomes in math/Is reproducibility even a possibility presentations

2. RE: Improving student outcomes in math/Is reproducibility even a possibility presentations

3. RE: Improving student outcomes in math/Is reproducibility even a possibility presentations

Related Content

Reproducibility in statistical analyses

Faculty Position in Statistics - Oakland University

Faculty Position in Statistics - Oakland University

Visiting Faculty Position at Oakland University, Rochester, Michigan

Visiting Faculty Position at Oakland University, Rochester, Michigan

Contact Us

Membership

Privacy

Follow Us