The Ann Arbor ASA group has quite a few folks that would consider themselves Data Scientists. They also have groups of traditional biostatisticians and applied statisticians. It's interesting when we get into our monthly discussions. I fit in with the applied statisticians group and am making headway into the Data Science realm.
I would think within statistics itself, there is a bit of a division between "mathematical statisticians", "Biostatisticians" and "applied Statisticians". For example, I took Design of Experiments and Adv Design of Experiments. I allow my data to talk to me. Quite often, I find "significant" interactions among my factors. Usually, there are more interactions than main effects. To the other applied statisticians and the data scientists, interactions are normal, every day occurrences. To the biostatisticians, interactions are appalling. The "mathematical statisticians" question whether not not you really need all those interactions.
When we have had some presentations on biomedical data analysis, (Patient outcomes, etc), the biostatisticians tend to make simple models that intentionally leave out some factors (like doctor, hospital, lab, # surgeries the doctor had before the patient, time of surgery, etc). For the QC/Reliability statisticians and industrial engineers in the group, we ask, why didn't you include.....? For an IE, human factors plays an important role in the quality of a product. Ignoring HF, is a cardinal sin.
The mathematical statisticians tend to see things from their perspective, and usually teach the "proofs" courses in the stats curriculum. I've had some interesting conversations with these types of statisticians. They feel "proof is truth". When you confront them with real data based on reality, and you can show they are wrong, they tend to get mad. If you use an alternate perspective for the same problem, they get mad too.
When you add in "data scientist" to the mix and the common denotation that they are "statisticians that can program" you add in another layer to that discussion. A lot of the companies in the area want "data scientists" and they claim a data scientists knows how to collect and manage the data they need from a server (IT) and analyze the data using various predictive analytics (statistics). They want someone with a degree in stats and a minor in MIS/IT/Comp Sci or major in Comp Sci and a minor in stats.
I think we do need to define what a data scientist really is. A "statistician that can program" is a bit too vague. I think all statisticians use R, SAS or some other language to write programs and analyze data. Based upon the local companies, they would not hire someone with these skills as a data scientist.
I think a better definition for a data scientist is someone that understands:
1) the computer algorithms the software is using
2) the concepts of databases and data warehousing
3) the statistical methods available for analysis
4) they need a full toolbox of methods, not a limited selection of "popular" methods
A traditionally trained statistician can fulfill Item 3 and part of item 4. (I don't know many statisticians that took Data Mining, Data WArehousing or a course on Database systems). A computer scientist can generally handle items 1 and 2 and part of item 3. (I know that a 5-page section on linear regression in a Data Mining textbook is not sufficient.) An MIS/IT person will be knowledgeable about items 1 and 2.
While there is some overlap between a computer scientist and a statistician or a comp sci and IT, I know no one group is best. But, there are areas where all 3 groups can get much better. Statisticians need to get better at handling and analyzing large data sets. Comp Sci needs to understand there is more to analysis than Random Forests and Neural Networks. IT needs to understand there is more to analysis than taking a mean and a standard deviation.
That might be how we can all come together. We can work with each other to make each other better.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
Original Message:
Sent: 07-12-2016 11:34
From: Charles Kincaid
Subject: Reaching out to Data Scientists
Hi Andrew,
Thanks for the response. Good thoughts. Yes, it will be beneficial at some point to understand the differences and similarities between the approaches. However, the first step is to be able to engage in that conversation. What ways can we reach out to Data Scientists, particularly those who are not part of the ASA or similar organization right now?
It seems to me that we have to reach "across the aisle," so to speak and invite them to the table. This has to be done in a non-confrontational, collaborative way as allies. Then there has to be a reason for them to be interested in talking to us, some mutually beneficial reason for the discussion.
It seems to me that we can learn some tips for this if we understand ways that statisticians and data scientists work together now. What ways do they collaborate now? Of course, that's assuming that they do collaborate now.
Do you collaborate with data scientists? What makes that successful?
If you don't, how would you reach out to them to begin the discussion?
Thanks again. I look forward to your thoughts.
Chuck
------------------------------
Charles Kincaid
Engagement Director
Experis Business Analytics
chuck.kincaid@experis.com