ASA Connect

 View Only
Expand all | Collapse all

Should we change the name of the field of statistics to "data science"?

  • 1.  Should we change the name of the field of statistics to "data science"?

    Posted 09-12-2017 20:14
      |   view attached
    Dear Colleagues,

    Attached is a short essay about changing the name of the field of statistics to "data science". The essay discusses significant advantages of changing the name and suggests that the disadvantages are minimal.

    Best regards,

    Donald B. Macnaughton
    MatStat Research Consulting
    donmac@matstat.com

    ------------------------------
    Donald Macnaughton
    MatStat Research Consulting
    donmac@matstat.com
    ------------------------------

    Attachment(s)

    pdf
    Macnaughton2017a.pdf   176 KB 1 version


  • 2.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 01:01
    Hi prefer same name 





  • 3.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 10:59
    I'd vote yes on changing the "ASA" name and scope to something more indicative of the practice of data science, but I'd vote no on renaming the field of mathematical statistics this way.




  • 4.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 03:21
    Data Science is rapidly becoming it's own field and specialization. It involves statistics and so much more. That so much more is of some controversy within the statistical community. I have the sense that many statisticians feel data science is both a pollution and a dilution of statistics. 

    If you look up a lot of the jobs around me for a data scientist, few, if any statisticians qualify. And I agree with that. The traditional statistics degree is becoming more and more obsolete, because statisticians refuse to change with the times. By statisticians redefining what a data scientist is, you'll make a confusing new area and discipline more confusing. 

    I'd be far more interested and concerned with the poorly defined terms used among different areas of statistics. For example, mixture models. Ask an econometrics expert, Bayesian expert, design of Experiments expert what that is. You'll get two opinions and a DOE experts factual definition;-) How many different names are their for mixed models, covariates, survival analysis, etc? And why? 

    I understand why you'd want to do it. But, I doubt you'll get "buy in" from those that matter most... the employers.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 5.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 13:00

    Andrew, while I agree with your conclusion that statisticians redefining what a data scientist is would bring about mass confusion, I wholeheartedly disagree with your statement, "The traditional statistics degree is becoming more and more obsolete, because statisticians refuse to change with the times." While big data is a growing area in research, I find I am still overwhelmed with collaborations that have little to do with big data or analytics (in the more popular sense of that word). More importantly I find that having a solid understanding of issues such as Berkson's Fallacy and Simpson's Paradox are vital in areas where folks are jumping ahead to find a signal in the noise of big data. Statistics is not just something that should be covered in one lecture entitled "all the basic statistics mistakes data scientists should avoid" (aside: giving a lecture like this was actually pitched to me!), nor should statistician expect to cover all there is to know about big data from a SPARK bootcamp.

     

    To me, being a statistician is more than being able to garner information from data. It's more than being able to successfully argue why I decided to use a particular analysis...

    I work with folks in the field of informatics, statistics, mathematics, computer science... Depending on what is meant by "data science" – and there are multiple definitions of that as well... it's not just statistics that likes to have K-N mappings of definitions to terms – and the context, I might consider all of us "data scientists" whereas in certain contexts it is preferable to say specifically what you mean. (e.g. I was called a "data scientist" in a newspaper article, which I was fine with.. but I would never agree to using that term to describe me (or likely anyone else) in a grant proposal).

     

     

     

    Mary J. Kwasny, ScD

    Associate Professor of Biostatistics

    Northwestern University

    Feinberg School of Medicine

    Department of Preventive Medicine, Biostatistics Collaboration Center

    680 N. Lake Shore Dr., Suite 1400

    Chicago, IL 60611

    Ph: 312-503-2294
    feinberg.northwestern.edu

    Feinberg School of Medicine

     

     

     






  • 6.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 10:14
    If we're taking about structured data the fields of statistics and data science overlap a lot. However, statistics is the field that has originated and improved the analysis of such data, with the help of probability theory, over more than 200 years.
    On the other hand, unstructured data like texts, images, videos, etc are new to statisticians and have been traditionally explored and improved by computer science.
    The more I dig deeper into Data Science the more I realize that to have a good control over it I need to improve my Math skills (topology, differential geometry, abstract algebra, measure theory, etc)  more than anything else because that's the intersection of all these areas. For instance, one field that fascinates me a lot in dimensionality reduction is manifold learning. Now weather you call that statistics or data science is the least of my concerns!

    ------------------------------
    Statistician & Instructor of Big Data Analytics
    Toronto
    ------------------------------



  • 7.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 10:24
    I am talking more about the approach to data analysis.

    Data science has it's theories and approaches. At the last JSM, they were openly mocked at multiple discussions and seen as invalid, because they don't conform to some of the ridiculous assumptions used in conventional stats. Rather than looking at Data Science as an alternate view on analytics, it often gets dismissed by traditional statisticians.

    If a statistician analyzes a data set, they follow a set of pre-determined rules for determining the quality of the model. How many of them involve looking at the data and assuming normality? Why not use some type of calculation for that like Durbin-Watson, KS normality, etc? (My best guess, no one did those types of calculations 70 years ago. Therefore, why start now?) I learned about Durbin-Watson tests over the objection of a few stats profs.  

    When was the last time a statistician used a training, validation and testing data sets in their models or even a confusion matrix? That's all standard in data science. In 15 stats classes, I never even heard of any of those things. When I use those methods with traditional statistical methods, on textbook data, it becomes quite clear that the textbook models aren't very good quite often, compared to some of the partition methods that are available.  

    Statisticians assume that models need to be simple and fear over fitting the data. In Data Science you learn to "improve" the model until you start to over fit it. (That's why there is the training, validation and testing data sets.) Data Scientists use those data subsets to determine when over fitting starts to occur. What does a statistician use to determine if something is over fit?  

    Statisticians fear interactions and "bizarre" terms. Statisticians fear large VIF. (It's easy to "mean center" data and include lots of interactions in a model while maintaining a low VIF.)  Data Science doesn't care.

    Statisticians "clean" data. Statisticians have multiple methods for dealing with missing data. All of which biases the data and can lead to improper models. Data Science just doesn't care.

    Statisticians like to tell the data what to do. If they have an outlier, they might remove it because it doesn't fit into their assumptions about how the data should behave. They will hand pick variables they believe are important. Data Science listens to the data and allows automated algorithms to make those decisions.   

    Statisticians want to import data tables and data sets onto their desktop for analyses. Statisticians use their stats software to manipulate data tables. When the data is too large to fit on their desktops, they start using sampling methods. A data scientist can use their cell phone to analyze all of their data. They understand the servers that store the data and the database systems and stats programs on those servers are the appropriate methods for processing, joining, manipulating and analyzing data.

    When a statistician talks about a "database system" or "databases" they are referring to Proc SQL. Data Scientists talk about Oracle, SQL Server, Hadoop, etc. Statisticians talk about 200,000 rows of data as "Big Data".... cuz that's about all that will fit on typical desktop. Data Scientists think 1,000,000+ tuples of data is small.

    When a Data Science model doesn't work, statisticians point it out as a "failure of data science". When a statistical model doesn't work, the statistician points it out as "random variability". 

    Statisticians and Data Scientists can learn a lot from each other. Statisticians and Data scientists do a lot of the same things. 

    The biggest difference between statisticians and data scientists is statistical theory and algorithms are based upon pencil and paper. Data Science assumes computers exist.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 8.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-25-2017 13:12
    Andrew Ekstrom,
    I think you overfit your defense of data science.  You have created incredible straw men in an effort to show that data science is superior.  
    Here's a more simple explanation of what is happening.  Statistics is hot right now.  More and more groups want in on the money.  Hence, the birth of data science, which is outside of traditional statistics.
    Always follow the money.


    ------------------------------
    Thomas Ilvento
    University of Delaware
    ------------------------------



  • 9.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-26-2017 09:45

    Colleagues,

     

    Thomas Ilvento is not right – he is EXACTLY right!

     

    thanks,

     

    cheers,

    joe

     






  • 10.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-27-2017 09:15
    By 'follow the money' there seems to be a tinge of a nefarious implication.  A bit misplayed I think.  If the money were 'followed', it would originate in demand for vs supply of 'data science' resources - those who can toil in any, preferably many, of the facets of the activity discussed on this forum.  We should all be celebrating this trend! - a market realization that data-driven decision-making is valuable and worth paying real money for...

    ------------------------------
    Tim Keyes
    Principal
    Evergreen Business Analytics, LLC
    ------------------------------



  • 11.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-02-2017 11:04
    Tim,
    I am on campus.  I have watched a number of new Data Science programs emerge across the country.  They follow a new model in higher education where M.S. degrees are designed to make money, with students who pay for their degrees.  This is a MBA model.  Some of these new Data Science programs are a collection of courses - choose two courses from this department, two from this one and so forth and whamo!  Now you are a Data Scientist.  I would like to suggest we do better than this.
    I am am absolutely in favor of markets.  Everyone, including universities, need to be challenged by the market every day.  However, I know what a Statistics degree means and what it entails.  I know what a student with this degree should know.  The field of Data Science is not as well defined.  I have talked with industry reps who are hiring people.  A lot of claims are being made and I wonder if these degrees can back them up.  Of course the market will decide.  
    I believe the field of data science, whatever that is at present, is evolving.  As in evolves, it is important to determine what Data Science is and what it isn't as an academic field.

    ------------------------------
    Thomas Ilvento
    University of Delaware
    ------------------------------



  • 12.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 06:00
    Current data science degree programs and statistics programs seem to have very different curriculum focuses. As far as I can tell data science degrees in most cases have a heavier computer programing focus while statistics is more mathematically focused (though there certainly are computer programing components in the curriculum at the higher levels).

    In order to answer the posted question I believe it is necessary first to answer: what are the qualifications of someone that best encompasses all the qualities of a newly defined statistician (or data science specialist) and how do potential candidates achieve the desired background skills and qualifications? Generally speaking, statistics in high school, for example, is a very small part of the curriculum and is currently integrated through the math curriculum in microscopic amounts or taught as a separate course, like AP statistics, at the very end of high school. Would renaming and therefore redefining the field of statistics result in a greater or smaller emphasis placed on it in grades K-12. High schools typically require 4 years of math and little if any computer science course work. Does redefining it as Data Science cause it to be moved into the field of computing and therefor further reduce the emphasis in K-12? If not, how would the average layperson know that? What changes, if any, would need to take place to adequately prepare the K-12 educator to teach data science compared to statistics? The field of education has generally been slow to adapt major changes. Most of my building is still traditional chalk boards with the majority of teachers using a basic projector on a movable cart - for example.  A similar situation exists in most of the schools I have visited or worked in. If redefining the field also requires greater access to computers at the K-12 level without changes in associated funding, would that limit the field entirely to those in privileged communities?  Does it make the field even less accessible to minorities and women for example? Perhaps or perhaps not.

    Whether or not the name changes there does seem to be a need to better define how one would be able to realistically pursue a career in the field before going through such a change. That conversation would need to include creating a clear implementable plan for how to achieve the ideal background and leadership involvement across multiple disciplines.

    ------------------------------
    Flora Quevedo
    Math Teacher
    Reading High School
    ------------------------------



  • 13.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-28-2017 15:11
    Why not call statistics "Statistical Science"? The broad public doesn't recognize the scientific character of statistics but seem to be taken with the "science" word in "data science", which is less of a science than statistics. This simple fix would do the trick of informing the public and differentiating us from data science at the same time. (I haven't read all the strings. Perhaps someone already suggested this.)

    ------------------------------
    Robert Riffenburgh
    Naval Medical Center
    ------------------------------



  • 14.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-01-2017 12:38

    I have a different opinion from Robert Riffenburgh's suggestion to call Statistics "Statistical Science":  I find the "Data Science" term distasteful enough but "Statistical Science" is even worse.  Why not suggest to Physicists that they should start calling their field "Physical Science" ?

     

    Jim Baldwin

     

     





    This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.





  • 15.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-03-2017 13:31

    Your point is well taken, but counter examples can usually be found anywhere (except in certain beautiful areas of mathematics). For example, if bias against racial groups makes one a racist and bias against a sexual group makes one a sexist, does bias against a social group make one a socialist? The question is if the benefits outweigh the costs (in your case negative emotional feelings)? I am not invested in this as a cause; it was just an idea to be considered.

     






  • 16.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-01-2017 12:39





  • 17.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-01-2017 12:40
    I am a statistician who is often hired by the Chief Data Science Officer of a large corporation to both analyze data and to sometimes evaluate work performed for him by data scientists; work which he recognizes as possibly questionable. On occasion I have found his suspicions to be well founded. (As one example, the model that had been generated for him could predict with 100% accuracy, and he was well versed in this particular phenomena so he knew that such predictions would be impossible.)

    I believe he hires me because I have a very broad scope of knowledge covering all areas of statistics, and he is smart enough to recognize when such knowledge is needed. My particular area of interest during this past decade has been measurement. As I age and "season," I have come to realize the incredible importance of obtaining valid and reliable measurements. But also, as a statistician, I try to keep up with the increasing discoveries being made in our field. Statistics is not static, but it instead is a very dynamic field.

    I believe that data scientists and statisticians should work together to achieve the best results, but I continue to believe that their core approaches to a problem differ. Thus I believe that each group should retain title names that distinguish between them.


    Sent from my iPad

    .




  • 18.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-01-2017 12:41

    Good point!

    Thus there should be "Statistical Science" and "Mathematical Science" along with "Computer Science", which are of course all related but are three distinct disciplines.


    -----------------------------------
    David Bernklau
    (David Bee on Internet)
    -----------------------------------









  • 19.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-02-2017 09:46


    I am not sure if this suggestion is right but one thing for sure, whenever I am working, I tend to think more of how I will collect, analyze etc the data (population, its relevance, sample size etc) than just the data in abstract. So the suggestion that statistics should equated to data science, in my view is not necessary. We are two different people. Although, apple and orange are fruits, it does not mean that they have the same characteristics.





  • 20.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 08:48
    Thank you for initiating this thread.  We need to continue the conversation about how the field of statistics needs to evolve in the face of the technological innovations of the day.  

    I don't think we can change the name of the field to data science anymore than we can sprinkle data science vocabulary and topics into traditional introductory statistics and call it introduction to data science.  Although statistics and data science share common skill sets, they have fundamentally different starting points are they are trying to different things.  The essential vocabulary of data science is different than that of statistics. 

    Traditional introduction to statistics class is geared towards preparing students to engage in the practice of inferential statistics.  Most of the tools we introduce are geared towards the investigative process, where you start with a research question, design a study, gather the data, summarize the data and make inferences based on the data. We are starting from the presumption that data on the "population" or group of interest is not available so instead we will rely on samples. We specify our hypothesis, our methods, our decision criteria etc. up front.  This is still at the heart of the scientific process.

    We teach probability and sampling distributions so that students can understand the normality assumptions underlying inference and can check to make sure the assumptions have been satisfied. If the assumptions are not satisfied, we may use Simulation Based Inference or announce that non-parametric statistics will be taught at some point in the future in the event they are so interested. For the most part though, we just advise that they draw a big enough sample that they don't have to worry about the normality assumption.  In introductory statistics,  descriptive statistics or "data analysis"  is a just a stepping stone  to  for inferential statistics. 

    As technology has evolved, an abundance of data has become available. Tools to turn this data into useful information are evolving as well.   In Data Science, "the" data is the starting point of the analysis: massive volumes and variety of rich but "untidy" data that is generated at unprecedented velocity. There is no sample let alone random sample. There is a constant stream of continuously updating data. We look at snapshots of this data stream, find a set of graphical and other tools to represent that data, refine and generalize the model and then create a dashboard to use our model to summarize the new data that started arriving right after we took the snapshot.  

    Fundamental to data science is reproduciblity,  metatdata, a rich variety of tool for data visualization, and exploratory data analysis.  To predict based on a mean square error from a training set like we teach in introductory regression would be considered naive. In data science, one is concerned with  overfitting, the bias-variance tradeoff,  and therefore the distinction between the training set, validation set and test set.    

    I look forward to the continuation of the conversation!

    ------------------------------
    Bernadette Lanciaux
    RIT School of Mathematical Sciences
    ------------------------------



  • 21.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 09:16

    In a nutshell, this says it all quite well:

    " In introductory statistics, descriptive statistics or "data analysis"  is a just a stepping stone to for inferential statistics."

    [BTW, in Stat 101 courses I teach, Lesson One begins with something like "Statistics is the science of data and the (common) language of science."]


    ------------------------------
    David Bernklau
    (David Bee on Internet)
    ------------------------------













  • 22.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 13:15

    I briefly read the article. The author claims that (1) data (even if imagined) comes first, and statistical theory follows. The author also argues that (2) changing the name of our field to data science would not mean we necessarily change the goals or activities of our field. 


    While I can almost agree with (1), though not wholeheartedly, (2) does not follow. First, we must understand the current usage of "data science" and while there is certainly overlap with the current usage of "statistics", there are different skill sets at play and most assuredly, different goals. 


    Where would theoretical statisticians fall in this overlap between statistics and data science? While perhaps there are some programming skills required by theoretical statisticians to provide data examples of their conjectures, I don't see much overlap at all, but perhaps I am being short-sighted.


    Further, while I think that some statisticians are also data scientists, I have a hard time imagining data scientists as statisticians. In my view, we need complete overlap if we are to call statisticians data scientists, and vice versa.


    Robyn



    Robyn L. Ball, Ph.D.

    Senior Biostatistician

    Quantitative Sciences Unit

    Stanford University

    http://med.stanford.edu/qsu.html






  • 23.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 09:43

    Changing the name of the field of Statistics to the field of Data Science is only slightly more appealing than changing the name of the field of Statistics to the field of Numerology.

    (In other words, the field of Statistics encompasses much more than data.)










    ------------------------------
    David Wilson
    ------------------------------



  • 24.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 14:07
    Edited by Kelly Zou 09-13-2017 14:28

    Data Science is generally Computer Science and visualization heavy. Changing the name from Statistics to Data Science may not distinguish between these two disciplines and set apart different backgrounds needed.

    Inter-disciplinary collaborations may be attractive. One can be good at multiple areas, besides Mathematics, Informatics, Physics and quite importantly, the subject-matter area.

    Speaking of starting coding earlier, the US may lag behind other countries. For example, please see the following article on BBC, UK:

    A computing revolution in schools

    "As children from five upwards return to school, they are going to have to start learning how to program - or to 'code' to use the trendy term which seems to upset some old-school programmers. This is the result of the new national curriculum for computing that is being introduced in England this term."

    ------------------------------------------------------------
    Kelly H. Zou, PhD, PStat, ASA Fellow
    Chair, ASA SPAIG Committee
    Chair-Elect, ASA HPSS Section
    ------------------------------------------------------------



  • 25.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 09:04

    Before one considers changing the name of the field of statistics to "Data Science" it might be helpful to recognize that data science IS NOT A SCIENCE!  Please revisit Popper, Kuhn, and the general body of work in philosophy of science – have a great day, everyone ...

     

    thanks,

     

    cheers,

    joe viscomi

     






  • 26.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-15-2017 13:06

    Thanks Joe,

         This viewpoint leaves me wondering about some related questions:

    1.  Following this line of thought, would one label "Statistics" a Science?

    2.  Same for "Computer Science"?

    3.  What about "Operations Research"?

    Best,

    Steve

     






  • 27.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 13:07
    Statistics is not data science.
    There are many academics who are theoretical statisticians, probabilists, etc., who are important and valuable members of ASA, but rarely, if ever, touch data.
    ASA provides sections for individual interests and there are also other professional organizations with more detailed focus. 
    ASA is the umbrella organization.

    David

    --
    David R. Bristol, PhD
    President, Statistical Consulting Services, Inc.
    1-336-293-7771





  • 28.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 14:04

    It really depends on the definitions.  At this point, the statistics and data science are two different fields.  Certainly, the two have a lot of overlap, but neither one contains the other totally.   For example, asymptotic theory in statistics does not seem to fit in current data science while to handle a 100T data set is more a computer science issue.  In analogue, we cannot say that probability theory is a part of statistics.  In early times, the meaning of statistics was restricted to information about states, and the modern term of statistics still bears such root.






  • 29.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-13-2017 16:17
    Dears,
    My answer to the Question: Absolutely not!.
    Yes, Statistics deals with data, but Statistics is more than data.
    Thank you for your concern.

    ------------------------------
    [Mohammed] [Shayib]
    [Associate Professor]
    Prairie View A & M UniversityMohammed
    ------------------------------



  • 30.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 02:03
    Dear,
    Data science uses statistics but not other way. 
    Many disciplines use statistics, for example, econometrics use statistics to solve economic problems , actuarial science also somehow close to data science use statistics to solve insurance problems but the only difference is one focuses only in insurance and the other one is somehow heavy engineering and computer science so I would rather see these disciplines appreciate the contribution and usefulness of this rich area( statistics) than changing it. and if you go back the history of statistics and people who developed most of the tests we use today came  from different disciplines, like economics, biology to even history.


    ------------------------------
    Zakarie Hashi
    ------------------------------



  • 31.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 10:00
    How should we collect data?  How should we extract meaning from data?  Only statistics offers a completely rigorous approach to answering such questions.  It's our "brand", so to speak.  I wouldn't want to mess with that.

    ------------------------------
    Daniel F. Heitjan, PhD
    ASA Fellow and Life Member
    Carrollton, TX USA
    ------------------------------



  • 32.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 12:40
    Every area of science offers answers to those questions! In particular, physics uses data to generate deterministic models. Then it uses new data and outliers to modify that original model to accommodate those outliers.

    In Data Science and Statistics, the models are stochastic. Models generated by both discipline are just as rigorous and both just as likely to fail and succeed.  Both are open to abuse and misuse. Let's not fool ourselves into thinking on is better or more accurate than the other. Because most statisticians won't like the results. In competitions, Data mining tends to win. Data Science is the future of analytics. It is becoming the standard in current analytics.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 33.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-14-2017 13:59
    I don't have a solution for the "branding" of the field issue, however I have a proposal for a "one-sentence elevator pitch of where statisticians add the most value" (the whole being stuck in an elevator with Steve Jobs scenario, whether historically true or not):

    Ideas:

    1. I like the summary at the very beginning of "Stats: Data and Models" (4th Edition) by De Veaux, Velleman, and Bock that statistics is ultimately about variation. As I heard often quoted at Google: "Engineers know what a mean is, but analysts/statisticians know what a standard deviation/error is."
    2. Furthermore we deal in inference: making statements that generalize to some greater unknown mechanism/population based on often imperfect and incomplete observations. One could argue this definition is too broad, as many other fields do the same and others do this with qualitative data. So we need something that suggests inference that specifically fits into a mathematical / probabilistic / quantitative framework.
    Result: Statistics is the field that specializes in sampling-based quantitative inference and the study of variation.

    So data visualization, for example, is done by both statistics and data science, and thus is not solely in statisticians' wheelhouse. Nor is the idea of modeling+error unique to the field of statistics.

    As an aside, another related exercise is: If a modal computer scientist and a modal statistician both taught a machine learning course, what would be orthogonal about the statistician's course over the computer scientist's course?

    ------------------------------
    Albert Y. Kim
    Amherst College
    ------------------------------



  • 34.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-15-2017 08:57
    ​As many have pointed out before, data science encompasses some statistics, but is more IT heavy and has a heavier focus on visualization. Data scientists typically care more for algorithms and analyses, and will only seldomly come up with new theory.

    There is a need though for more intense communication between both fields. A class of data scientists exists who have expertise in machine learning, yet are unaware of basic multivariate statistics, or likewise, develop algorithms on anomaly detection while being unaware of the entire literature on robust statistics. On the other hand, some statisticians regard machine learning algorithms such as neural networks as computational tools that lack a solid theoretical foundation and therefore, just disregard them. I think there is still a big upside potential in merging the best from both fields.

    That said, statisticians should also be lackluster to just ride the wave of data science, for the mere reason that it's presently flashy to do so. Statistics should keep the focus on the math and know that good methods will find successful applications. There presently is a trend in the statistics community that any novelty should be linked with big data. For instance, the Journal of Computational and Graphical Statistics presently rejects papers because the data examples shown in them are too small. What happended to small sample statistics? Statistics grew around samples of small to moderate sizes and analysis of such data is still very relevant today.

    Statistics should use data science as an opportunity to grow, not substitute itself by it.

    ------------------------------
    Sven Serneels
    Data Science Sr Manager
    BASF Corp.
    ------------------------------



  • 35.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 09:43
    I have to say that these posts are becoming unbearable to read, with mass generalizations and inaccuracies. To say that statisticians disregard machine learning, follow a set of pre-determined rules, always remove outliers, and do not use separate training, testing and validation data sets, is not at all representative of the work of most statisticians do ( and certainly none that I work with).

    Data science should continue to grow as another field related to, but different from statistics, and statisticians should welcome the complimentary work and pursue collaboration with those in computer science and biomedical informatics, etc.

    Although I am personally moving more into data science than I have been, I really do not understand the perceived threat to statistics. Is a data scientist without advanced formal training in statistics going to sit down with a nephrologist and understand how to best model trajectories of GFR over repeated time points to characterize the relationship between imaging measures and decline in kidney function? Is a data scientist going to meet with an immunologist and determine the best statistical models for treatment effects, and group by time interactions, for a range of biomarkers and other outcomes in a mechanistic study where they seek to understand underlying relationships?

    Let's spend more time inviting, rather than discouraging collaboration and mutual interests. And please - no name change!! I agree totally with the previously expressed concern that doing so would cause mass confusion.
     


    ------------------------------
    Douglas Landsittel
    Professor of Biomedical Informatics, Biostatistics, and Clinical and Translational Science
    University of Pittsburgh-School of Medicine
    ------------------------------



  • 36.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 09:22
    The word analytics seems to be just a way for computer scientists to arrogate to themselves the work of statisticians.

    Here's the Wikipedia-based definition of analytics:
    Analytics is the discovery, interpretation, and communication of meaningful patterns in data.

    I'm afraid that exploratory data analysis, something statisticians have been doing for decades, has been around much longer than the use of the word analytics in this context. Statisticians must also interpret and communicate findings.

    As I see it, Data Science brings two things to the table:
    1) Training and experience managing huge sets of data
    2) Large-scale applications of statistical methods to non-probabilistic samples

    #2 might not be so bad if, for example, the sample is really the population of interest. If, for example, Amazon is trying to figure out what items to show to a person after they have purchased something else (based on the probability of that person buying an item from the list), then Amazon isn't really interested in predicting for individuals who aren't already Amazon members (at least I wouldn't think so.)













    ------------------------------
    David Wilson
    ------------------------------



  • 37.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-20-2017 23:14
    There are many interesting points here. Confessing that I have not had time to read all of this chain, I am not sure the following concern has been addressed. The ASA, as other ethically-minded professional-academic societies, as a regular practice has illuminated ethical concerns in the use of statistics and social issues calling for ethical policy review, where data and its analysis is concerned. This includes issues in the sources and collection of data, research methods and issues, and in the use of data analysis in research and the direction of social policy or action, as these may affect civil rights, human rights, socio-economic equity, privacy, and other values of liberal, open, democratic society.

    On the other hand, much of Big Data appears to be collected without informed consent, sometimes for purposes against the interests of the sources and subjects of the data, without much structural ethical supervision, and often for narrowly commercial and exploitative ends contrary to the values enumerated above. A good treatment of the dangers in the use of such data is Cathy O'Neil's 2016 book, "Weapons of Math Destruction." To what extent would either inclusion, or arms-length differentiation, regarding the relations between the discipline of statistics and its representatives and the advocates of Big Data and other such 'epiphenomena' - as effectively described by one of the correspondents I read here, and the advocates of the use of AI that automates data driven decisions and their social effects - without the mediation of statisticians conscious of an ethics of professional practice that is greater than a self-serving economic end...serve the set of principles that sets a profession apart from being a mere instrument of economic exploitation or of administrative tyranny that wreaks havoc on the powerless, needy, and uninformed masses who make up the vast majority?

    ------------------------------
    Andrew Tierman
    Saginaw Valley State University
    ------------------------------



  • 38.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-16-2017 17:20
    As others have commented, the fields called "statistics" and "data science" have considerable overlap, but each covers subjects not covered by the other. I think that wherever possible, using the umbrella phrase "statistics and data sciences" would be best. The university from which I am retired does indeed have a Department of Statistics and Data Sciences (https://stat.utexas.edu/). Although this arose because of historical considerations*, it now seems like it was a good idea to bring the two fields together.

    * For various reasons, there was no department of statistics; the Dean of Natural Sciences  thought there should be one; the administration was not willing to create a new department at the time; and there was also pressure to create an academic unit on scientific computing. The Dean was eventually able to push through a "Division" of Statistics and Scientific Computing, and after a change of administration, this was eventually upgraded to a department.

    ------------------------------
    Martha Smith
    University of Texas
    ------------------------------



  • 39.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 13:01
    Dear All, 

    The 2015 ASA Board Statement on the Role of Statistics in Data Science may help to inform this discussion, http://magazine.amstat.org/blog/2015/10/01/asa-statement-on-the-role-of-statistics-in-data-science/, along with Ron Wasserstein's accompanying blog entry: The Role of Statistics in Data Science – An ASA statement

    Let me also take the opportunity to make sure you are aware of the whitepaper from the 2016 Department Chairs workshop, Success, Opportunities, and Challenges for Statistics and Biostatistics in the Data Science Era. Some of the videos of the presentations may also be helpful: https://www.amstat.org/ASA/Meetings/Department-Chairs-Workshop.aspx. I particularly recommend the presentation of CMU Dean of Computer Science Andrew Moore on how statisticians, computer scientists, and others can work effectively together to solve challenging data science problems: https://www.youtube.com/watch?v=LmhSCVURtM4&index=3&list=PL9G4n1wtRTDTqwdSu8GhoqYIEIDkfHYMi

    Best,
    Steve

    ------------------------------
    Steve Pierson
    Director of Science Policy
    American Statistical Association
    ------------------------------



  • 40.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-18-2017 22:35
    From some of the replies so far, I wonder if I am in the same field as some of my colleagues, even though we do some of the same things.

    Data Science is the study of data. One takes data as a given, and then see what one can learn from it.

    Statistics, historically, do not take data as a given. And in my view this has been our central distinguishing characteristic. We (applied statisticians at any rate) study real-world processes, not data. Any given data set is merely an example of a possible dataset that could be generated, an epiphenomenon, not a phenomenen and certainly not a noumen. We regard data as not being entirely real.

    For this reason, study design - how to generate data relevant to what we are trying to study in an unbiased and unconfounded way - has been critical to our profession. The fact that there is now a lot of data lying about that was never generated by design for a particular purpose doesn't make the need for design irrelevant, it makes it even more important.

    The statistician looks past the data itself to study the process by which that data was generated and evaluate its relevance to our question, it's potential for bias and confounding, etc.

    That degree of critical skepticism of data, which comes from not regarding it as the fundamental reality, is in my view entirely missing from a great deal of what passes for data science. The data scientist generally does not start with a question and figure out what process needs to be developed to generate data that reliably answers it. The data scientist generally starts with data that happens to be lying about and tries to see how it can illuminate the question at hand, all too often without considering the process that generated it or how that process may have affected the data's relevance.

    More than half a century ago, Joseph Banks Rhine discovered that when two people look at cards separated at whatever distance, the cards they draw have persistent measurable correlations over time. The reason is not so much telepathy, as that humans do not select cards in genuinely random fashion. Asked to select a random sequence, humans will have fewer consecutive cards and fewer apparent patterns than a true random selection would generate. Two human pseudorandom generators will therefore be correlated, despite the absence of any ESP.

    It is perhaps the quintessential task of statisticians to understand what was actually happening in the Rhine experiments and hence what their results mean, and in particular to challenge whether key assumptions (like "random" sequences) are being conformed to and what deviation from assumptions means in terms of real-world interpretation. To do ESP experiments unconfounded, the sequence of cards has to be generated by a randomizer, not by the human test subjects. When this happens, the apparent ESP effect disappears.

    Statisticians are trained to spot issues like this, not to accept data at face value, to look through and behind it. Are data scientists trained to have this skepticism?

    Jonathan Siegel
    Associate Director Clinical Statistics

    Sent from my iPhone

    .




  • 41.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-19-2017 03:18
    Jon,

    I think it wold be great if all statisticians took the view you are discussing. I just don't see it. I don't know many statisticians who are skeptical of their data. I know several statisticians that would rather NOT know where their data comes from nor explore it to see what the data tells them. There seems to be a significant fear of finding things out that disagree with their harmonious view of the world. Most of my classmates, professors and professionals look at data as something they got in an excel sheet(or something similar.)  If  those students took Stat Quality Control and/or Design of Experiments seriously, they might have a opinion of data more like yours. 

    For example, if statisticians that work at hospitals looked into running inter and intra laboratory comparisons for several of their common medical tests, I would hope they see data the way you do. But, if you try to explain to those statisticians that their data is biased and that the only thing they know for certain is that their data is WRONG, because they don't know how their data is biased, they'll defend their data as though it was God given. A simple gage R&R study should show them a new way of thinking about their data. But, that disturbs their world view.

    On the other hand, what you described as a statistician is almost exactly what I see in every industrial engineer I know. They understand that the data is biased. They know that the systems generating the data are not optimal. They won't let, "We've been doing it this way for decades, why change now?" get in the way of finding things out and making things better.

    Oddly, understanding the processes that generate the data is a big part of the good Data Science degrees that are out there. They all require students to have a cognate area outside of math, stats and comp sci. They want the student to ask intelligent questions and be skeptical about the data. They want students to have some basic understanding of the system that generates the data.

    Meanwhile, when I told my MS Applied Stat adviser I wanted to take a couple courses in Database systems, he told me I needed to take Modern Algebra instead. I asked about taking a couple courses in molecular biology and a couple courses in bioinformatics from the comp sci department. He agreed that I could take bioinformatics and was disgusted that I would suggest taking molecular biology. (After all, what does molecular biology and genetics have to do with bioinformatics?;-) So I left the program. 

    I left a career in chemistry/biochemistry 10 years ago to get into statistics and applied mathematics. I figured knowing where the data comes from, the lingo and lab practices would make me a great statistician. I'd know and understand where the data comes from. I could ask eh important questions about the data. In the 10 years since I left chemistry, I've held zero positions that analyze biochemical/chemical data. It's the fact that I question the data that has kept me out of those types of statistician jobs. I've been accepted by the industrial engineering and data science community. 

    I understand where the data comes from. I openly question it's validity.

    I want to know how the system works. I want to make it better.

    I refuse to fear finding things out that will shake my world. In fact, I welcome it.

    I refuse to take a myopic view of data and the world. I look for alternate points of view.

    For that reason, I'm now focusing on Industrial Engineering and Data Science.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 42.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-20-2017 09:48
    Edited by Subhadeep Mukhopadhyay 09-20-2017 09:48

    My mantra offers the following analogy: Statistics is to Data Science what Physics is to Engineering.



    --------------------------------------------------------------------------------------------
    Subhadeep(DEEP) Mukhopadhyay, PhD
    Assistant Professor of Statistical Science
    Temple University, Fox School of Business
    Speakman 335
    1810 North 13th Street
    Philadelphia, PA 19122-6083
    Phone: (215) 204-3309
    Fax: (215) 204-1501
    http://sites.temple.edu/deepstat




  • 43.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-20-2017 11:34
    There's room in the Big Tent for everyone. The Big Tent is data-driven decision-making in a learning environment.  Who we need to do this, no matter what they're called, or how they are trained, or how they approach their work, are welcome. Free admission. The more diversity the better, so long as minds are open. The realities of learning environment itself will adjudge (predicted vs actual) what approach is best.

    ------------------------------
    Tim Keyes
    Principal
    Evergreen Business Analytics, LLC
    ------------------------------



  • 44.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-21-2017 12:42
    ​I applaud the comments about how a statistician should look "behind" the data to how it was obtained, how reliable is it, what does it represent and hence what can we learn from it.  Of course the profession (well, folks like R. A. Fisher) observed that the statistician should be involved in the project data collection design at the start. W. Gosset  (aka Student) collect the data he had to analyze. My ideal statistician is deeply involved with the subject matter of application, and with science per se,  and works as a member of a team. It worked for me.

    In model selection I keep having, it seems, to assert there is no true model because data do not come from a model. They come from reality + some sampling/measuring process.  The "model" is just a means to filter information, about reality, out of the data. If a data scientist, or anyone, simply accepts the data, at face value, as numbers to be "crunched," they are not doing good statistics. 

    I do not wish to change the name to data science. I would rather say statistical science. Make the emphasis on more than just the data, as such.

    ------------------------------
    Kenneth Burnham
    Colorado State University
    ------------------------------



  • 45.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-22-2017 09:29
    The argument Statistics versus Data Science is a meaningless one.  I believe Dr. Subhodeep Mukhopadhyay said it the best:

    Statistics is to Data Science what Physics is to Engineering, except I would reverse the sequence as follows: 

    Data Science is to Statistics what Engineering is to Physics.

    The reason for the reversal is simple.  One is the whole (Statistics or Physics) and the other is a part (Data Science or Engineering).  The discipline of Statistics is not simply applied statistics (which probably could be equated with Data Science), it includes probability theory, stochastic processes, renewal theory, quality control, etc.  Theories are developed in these areas and hypotheses are being framed.  They eventually give rise to methods for testing hypotheses.  This is where Data Science comes in. I have practiced and taught both applied and theoretical statistics.  My colleagues and associates never just analyzed data; we tried to understand the reason behind the experiment and its design.  We tried to understand the deviations and improprieties in the conduct of the experiment.  This type of multi-way communications often strengthen the hypotheses being tested with the possibility of producing a better design (See the book by Box who got this mantra from his father-in-law Fisher). Whoever thinks statisticians do not ask questions and try to understand the experiment, is not a student of statistics. 

    I have seen too many "Data Scientists" relying on software packages such as SAS or some others and convince themselves what they are getting are statistics. For example using Proc GLM for simple linear regression with lack of fit and challenging statisticians since SAS gave the result, it has to be true. Or, using Proc GLM to analyze repeated measures with a covariate constant over time and saying since SAS gives the result they have, they have to be correct. Many of these Data Scientists assume a program such as SAS is statistics.  Possibly not studying enough statistical theories, they do not want to challenge such a concept.

    In any case, I think we are wasting too much time (that could be used to study statistics- theory and practice) arguing whether statistics is dead or whether Data Science should replace statistics.  Time and again many "Data Scientists" would come to me and my colleagues whenever they encountered problems with their data for advice.  There is room for both.  In my view, a good group of statistics in the industry or regulatory organizations should have both good statisticians and Data Scientists. Finally, let me tease some of the responders- "Data is" is incorrect;  the word Data is the plural of the Latin word "Datum", which is singular.  So "Datum is" and "Data are" always correct whether in the Queen's or American English!

    Ajit K. Thakur, Ph.D.

    Retired Statistician (Not quite retarded yet)






  • 46.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-25-2017 13:03

    Re: "Finally, let me tease some of the responders- "Data is" is incorrect;  the word Data is the plural of the Latin word "Datum", which is singular.  So "Datum is" and "Data are" always correct whether in the Queen's or American English!"

    Good point, although "is" sort of goes with "data". Thus, how could we make it so with the above in mind? Easy! Use the word "set". ("The set of data is..." or "The data set is...")

    Unrelatedly, the word set is almost always a good word to use as, as far as I know, there is no word in the English language with as many definitions as "set" has...

    ____________________

    David Bernklau
    (David Bee on Internet)
    ____________________







  • 47.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-26-2017 10:20
    Well said, David. Thanks. As you indicated, we never encounter or deal with one "Datum"; we deal with many "Data" or one or many " Data set (s)"!

    Ajit Thakur, Ph.D.
    Retired but not yet Retarded Statistician!





  • 48.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-25-2017 13:03

    Consider, however, that Engineers get rich, while Physicists don't.


    Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.





  • 49.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-01-2017 12:38
    I agree that Engineering is basically applied physics. But, you can be a data scientist and not use statistics. I went to a talk a few months ago where a psychology student used partial differential equations to develop and test a model for neural pathways in a brain under different conditions. Some of the material covered in various Data Mining texts and DM research deals with computationally complex, non-linear optimization. A lot of data science uses statistical methods and statistical theory. But, not all data science is statistically based.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 50.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-20-2017 09:56

    Andrew (Ekstrom),

     

    Respectfully, I think that the problem you're expressing is not nearly as bad as you think/say it is.

     

    I think you're making sweeping generalizations about "statisticians" based on your own personal/anecdotal experience, and assuming that represents "statisticians" and the field of "statistics" as a whole (i.e. your anecdote about your Master's degree program). 

     

    Most quality statisticians that I know already do exactly the things that you're calling for: ask probing questions about the source of the data, try to understand the data structure, and make those part of the consideration for their analytic approach.  To suggest that this is a problem with "statistics" is ridiculous.  If anything, most of us are more likely to encounter non-statistical collaborators (physicians, in my specific line of work) that are surprised that we want to know more about the source of the data, because they (the non-statistician) assume that data-is-data and the statistics come from a recipe in a statistical cookbook ("I have a continuous variable, which means that I should use ANOVA").  However, virtually no statistician that I've interacted with professionally takes the data at face value without asking any questions about the source. 

     

    You're building a strawman so you can tear it down.  Frankly, leveling this series of statements:

     

    "I understand where the data comes from. I openly question it's validity.

    I want to know how the system works. I want to make it better.

    I refuse to fear finding things out that will shake my world. In fact, I welcome it.

    I refuse to take a myopic view of data and the world. I look for alternate points of view."

     

    ...at a discussion board populated by statisticians is very much preaching to the choir.

     

    Good statisticians already do all of these things. 

     

    As noted above, I'm much more likely to get pushback from non­-statisticians in that regard.

     

    If you've not held any positions analyzing biochemical/chemical data, that is not a problem with the field of statistics, but with the people who run those specific departments to which you've applied that will not accept a statistician that asks questions about data.

     

    Thanks,

     

    Andrew D. Althouse, PhD

    Supervisor of Statistical Projects

    UPMC Heart & Vascular Institute

    Presbyterian Hospital, Office C701

    Phone: 412-802-6811

    Email: althousead@upmc.edu

    Twitter: @ADAlthousePhD

    Website: Faculty Profile

    Website: HVI-CBC Profile

     

     

     

     






  • 51.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-08-2017 17:17
    I agree and disagree with parts of what both Andrew's said.  Some statisticians check data and residuals plots for anything that seems chemically or physically unreasonable and follow up when something looks "odd".  Some of us question p-values that seem "unreasonably insignificant".  Others just try to get a "sales job" done as quickly as possible.  (The same sort of differences exist in all professional fields.)  I don't know how to distinguish between the two other than by repeating their work or getting to know the individuals involved.  Sometimes management attitudes consciously or unintentionally influence that sort of thing and those attitudes can very within a company over time or between departments.

    But that discussion has little to do with whether we should call what we do "data science".  I would not want to call statistics "data science" because the latter is usually associated with "big data" and Box, Hunter, Hunter statistics is completely different.

    BTW, like Andrew Ekstrom, I started out as a PhD polymer chemist.  I later regressed to an MS in Applied Statistics because Goodyear needed someone in house and happily paid for me to go back to graduate school at their expense and on their time.

    ------------------------------
    Emil M Friedman, PhD
    emilfriedman@gmail.com
    http://www.statisticalconsulting.org
    ------------------------------



  • 52.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 10-04-2017 15:29
    Andrew Ekstrom,

    I'd say your experience is atypical.  I've worked in government, academia, and now industry for 15 years now.  Many data scientists (certainly not all) I've met, who weren't statisticians or actual scientists first, took a few Coursera courses and declared themselves data scientists.  They don't really understand measurement, bias, or even the assumptions behind the machine learning techniques they are using.  They don't get that prediction and inference take different approaches to the data and should involve slightly different statistical methods.  For example, while model like the random forest can provide insight, it does so with a much higher computational burden and less power than more standard inferential statistical models.  Its a good technique, but you want people who know when and where to best use it.

    Just to give an example, I went to a local data science event in town.  One of the presenters, a guy with a Data Scientist job title, was discussing a Bayesian ecological prediction model a statistician on his team built.  He said, and I'm paraphrasing here, 'I don't really understand it, but it works really well'.  I'm wouldn't say that level of ignorance is typical, but it isn't rare.

    As these folks over promise and under deliver, I am starting to see more skepticism from business leaders about both data science and statistics.  That is really unfortunate.  The MS and MPH programs that train applied statisticians ground them in study design, measurement, and all the issues that can go wrong in improperly applying analytic techniques to flawed data.  That knowledge is considered critical to a good statistics education.  I've worked with a lot of statisticians over the years and I have never worked with one like you described.


    ------------------------------
    Angelique Zeringue, PhD
    Statistician
    Mercy Healthcare
    St. Louis, MO
    ------------------------------



  • 53.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-23-2018 15:33
    https://www.kdnuggets.com/2013/12/what-is-wrong-with-definition-data-science.html

    I wrote this article back in 2013. I think it is still pertinent today.

    ------------------------------
    Michael Mout
    MIKS
    ------------------------------



  • 54.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-19-2017 15:32
    As someone with post-graduate degrees in both Physics & Statistics I find this discussion offensive. The various science & mathematics fields including statistics have existed as concepts for centuries while the field of data science is a concept invented in the last 10 years or so. As far as I am aware, a Data Scientist need not have any formal training in science, mathematics or statistics yet it seems they can claim titles in all these areas. I have no problem recognizing "Data Science" or "Big Data" as specialty fields to be taught in most of our universities but without a requirement for formal training in science and statistics, they should not claim the titles of scientist or statistician. I have talked to many recent graduates with "data scientists" and "big-data theorists" backgrounds and for the most part they had minimal formal training in science, mathematics or statistics.

    It is one thing in a personal conversation or in a personal opinion blog to put down another group of humans but it is a more serious insult to come to an official "statistical" organization's website and claim the group should not exist. I feel my life has been invaded by trolls. Sadly, this is the way of the internet.

    It is my personal recommendation that this thread should be closed.





  • 55.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 09-21-2017 09:11

    I usually stand on the sidelines lurking but in this case I feel I would like to add my thoughts to this discussion. So called "Data Science" is not statistics. As defined by Webster, statistics is :

    "a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data." Obviously, we can debate the use of masses but the initial focus on collection is note worthy. How we look at any data set is based on education, training, and experience. My initial reaction is to reject all data handed to me. If I have not been involved with the planning of the collection of data, it is always suspect in my mind. 

    It is interesting to me to see how someone with experience in the auto industry assumes the basic principles placed in automotive procedures are wide spread and understood in every other company. Many of the procedures, methodologies, and management practices in automotive were implement and championed by several statisticians.  The battles were legendary and frequently lead by Dr. W. Edwards Deming. Starting with his very simple question "Do you understand variation?" Applications of statistics could not have occurred without strong management support and a desire for change. 

    When we consider the use of statistics in measurement, a focus was placed not just on accuracy but on repeatability and reproducibility of the entire measurement system - device, procedures, different labs, and humans. Systems and procedures were put in place to ensure engineering and production would implement these assessments. Management provided training and support. 

    When a new and  important car component was being launched, the executive of the division was reviewing with the design and manufacturing engineers the process studies. When he questioned them about the variation of the measurement systems, he got a non-response. He had decided this was too critical to the success of the plant, the new car, and the future of the company. He canceled a trip he had scheduled and called everyone involved to meet at the plant on the next Saturday. They spent the entire day conducting hundreds of measurement studies. His leadership was critical to the implementation of the procedures. He managed by them and eventually those procedures became routine to the organization and the company.

    There are many useful and critical applications of statistical methodologies to be used across many organizations. But without training with key executives and changing of practices, it is very difficult to get change. 

    The other day, it was posted here about the NSF sponsoring a committee on repeatability and reproducibility of study results.  Whether this is for clinical studies, published scientific findings, or basic lab results, I think this is a step forward. If everyone looking at data, pauses and considers how the data was generated, what process produced the results, and do we understand how much measurement variation added to the overall data variation, this would be a great leap forward for all of us.



    ------------------------------
    Eileen Beachell
    Quality Disciplines
    ------------------------------



  • 56.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-22-2018 22:53
    My earlier post (2017) recommended that we change the name. There were around 75 responses to the post and to related posts in this thread (in ASA Connect and anzstat). Most of the responders who expressed opinions were against the change. However, there is another group of thoughtful statisticians who see benefits of the change. Here are some key points:

    First, we must carefully consider whether the change would be disrespectful of current data scientists. I believe there can and should be no disrespect, only good intentions for a sensible change. 

    The change is sensible because although our specific methods often differ, we statisticians and data scientists do the same thing -- we invent, refine, and use general methods to draw reasonable conclusions from data that are obtained in scientific (empirical) research. Since both fields do the same thing, therefore, to avoid confusion, they should both have the same name. This gives the overall field synergy as the two subfields combine, agree about, and disagree about their various approaches. 

    The name "data science" for the overall field is best because it describes both fields best, sounding more interesting and more practical to laypeople than the puzzling (to them) term "statistics". 

    If we show proper respect for current data scientists, and if we rename the field of statistics as "data science", and if we convincingly explain why studying data is useful, then we will help laypeople to understand the field. This is important because most laypeople don't know what we do. (The field of statistics may be the most misunderstood of all modern scientific fields.) Thus a key reason for changing the name is that this would help laypeople to understand our field.

    Consider a second important reason for changing the name: In view of the large grants being awarded to data science, Terry Speed observed that the field of statistics "might miss out on the millions being lavished on data science right now" (2014, p. 3). Then Terry seemed to shrug. He said, "Let's wait for 10 years and see who is still talking about Big Data and data science" (p. 3). Thus Terry believed when he wrote his article that data science was a passing fad. 

    Here, we must distinguish between the name "data science" and the methods of data science. As with any field, some of the methods of data science will lose favor as other more effective methods are brought forward. But the name will prevail because (arguably) it perfectly describes at the highest level what statisticians and data scientists do.

    Perhaps to stimulate debate, Terry wrote:

    • Did any species ever avoid extinction by adopting a new name? No, they adapted, they evolved, and so must we (2014, p. 3). 

    But Terry's argument by analogy is invalid because there is no relationship between the name of a species and its survival or extinction. (Nor do species ever themselves adopt new names.) In contrast, there is a relationship between the name of the field of statistics and its prevalence. This is because the term "statistics" has four distinct meanings, as discussed in my earlier post. These four meanings give laypeople a sense of ambiguity at the outset. In contrast, the term "data science" has a single simple meaning that presents the key ideas of statistics in words that laypeople can understand. 

    Thus, if we change the name, and if we give a good explanation of why we are doing this, we will help laypeople to understand and respect the usefulness of the field of statistics. This will unlock resources. Thus a second reason for changing the name is that this would unlock more deserved resources for the field, including financial resources. 

    It goes almost without saying that if we change the name, we will merely be performing a sensible public-relations exercise -- we won't be changing the function of the field of statistics. 

    Let us avoid the divisive question of whether the present field of statistics is somehow superior to the present field of data science because this question is immature and irrelevant. Let us also avoid the question of what exactly is a statistician and what exactly is a data scientist because that appears to be emphasizing artificial boundaries and is thus an academic form of racial discrimination. We are all brothers and sisters. And the data-science arena is big enough for all who wish to understand data, from the highest to the lowest levels of knowledge, competence, and specialization. 

    If (by changing the name) we statisticians bring our ideas into the arena, then we can help those with gaps in their knowledge to close the gaps. And, by exploring the arena, we may close some gaps in our own knowledge.

    Importantly, if we change the name with tasteful eye-catching fanfare, this will generate substantial public interest. And if we make the change carefully in one coordinated swoop, this will generate respect for the field for its unified effort to clarify its function. 

    Here is a graph of the count of the number of activities using the phrase "data science" in the online program of the Joint Statistical Meetings (JSM) from 2008 to 2018:
    The graph and the current tenor of statistical discussions about data science suggest that we could do nothing, and the name of the field might change to "data science" on its own. But letting the change happen at its own speed won't have the same positive public-relations impact as changing the name in one swoop. In view of the broad misunderstanding of the field of statistics, changing the name in one swoop together with a carefully mounted public-relations campaign to (a) explain the change and (b) explain the role of the field is highly sensible.

    But could we obtain the necessary cooperation among the many statistical organizations, university and college statistics departments, statistical business units, and statistics publications to carry out a coordinated change? Arguably, yes. Almost all statisticians believe in the vital social good of the field of statistics. And we believe it is useful to promote the field in the best possible way. If that is true, and if enough statisticians agree that the change would be useful, then we can change the name in one swoop.

    Enhancing the understandability of the field of statistics is socially important because the field is ubiquitous, with an active or potential role in almost all scientific (empirical) research. Therefore, if we carefully explain why we are changing the name, we could get financial support from granting organizations to help with the cost of the change. 

    But how should we explain the role of statistics or data science in scientific research to laypeople? What is the role? A sensible explanatory principle is that data science (statistics) is mainly a set of techniques to discover relationships between variables in sample data. If we perform this discovery properly, it enables us to predict and control the values of variables in new entities from populations. (The study of univariate distributions in data is a mathematically degenerate case.) We could discuss practical examples from different fields (e.g., science, engineering, technology, business, social services, sports) to illustrate how knowledge of relationships between variables is useful for prediction and control. And we could explain (in lay terms, no math) how data science (statistics) helps researchers to obtain knowledge of relationships between variables from data.

    To implement the change, statistical societies could jointly appoint a committee of senior representatives of the various groups with an interest in the change. The committee could compile the wishes of all the involved groups and then plan a strategy. They could specify a timeline and a unified set of information releases and events, with carefully written material to help laypeople to understand. The committee might hire a public relations firm with understanding of the layperson's perspective to advise the committee and to help with stylistic aspects of the implementation. Arguably, the success of the campaign would be mainly determined by (a) the statistical quality and understandability of the material its writing team produces and (b) its ability to meet the requirements of its stakeholders, including laypeople, statisticians, and current data scientists.

    We could do this. If we show proper respect for current data scientists, then our work with data gives us the right to say that we, too, are data scientists. And we can bring (with proper humility and proper pride) our powerful statistical methods into the data-science arena. And we can make the field of statistics easier for laypeople to understand and respect.

    Excerpts from Jon Wellner's 2017 Institute of Mathematical Statistics (IMS) presidential address, with discussion about data science, are linked below. 

    A recent article by Sofia Olhede and Patrick Wolfe, co-coordinators of the IMS Data Science Group, is linked below. 

    Donald B. Macnaughton


    References


    Macnaughton, D. B. (2017), "Should We Change the Name of the Field of Statistics to 'Data Science'?" https://matstat.com/macnaughton2017a.pdf  September 12, 2017

    Olhede, S. and Wolfe, P. (2017), "Quo Vadis Data Science?", IMS Bulletin, 46 (8), 10-11. http://bulletin.imstat.org/wp-content/uploads/Bulletin46_8.pdf

    Speed, T. (2014), "Trilobites and Us", AmStat News, January 1, 2014. http://magazine.amstat.org/blog/2014/01/01/trilobites-and-us/   [This link was broken at the time of this posting.]

    Wellner, J. A. (2017), "Teaching Statistics in the Age of Data Science (excerpts from 2017 IMS Presidential Address)", IMS Bulletin, 46 (7), 6-11. http://bulletin.imstat.org/wp-content/uploads/Bulletin46_7.pdf


    Notes


    The counts on the graph were obtained through an Abstract Keyword Search for an "Exact match" of the phrase "data science" (without the quotation marks) in the online program on the American Statistical Association website for the Joint Statistical Meetings for each of the years. These counts appear at the top of the Results page of a search. These are counts of individual presentations, sessions, committee meetings, organized luncheons, courses, etc., whose title, keywords, or abstracts used the phrase. The counts were obtained on June 13, 2018.

    This post was sent to the American Statistical Association Community email list, the anzstat email list, the Statistical Society of Canada email list (d-ssc), and was announced in Allstat, all on or shortly after July 22, 2018.


  • 57.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-23-2018 14:43
      |   view attached
    I think where people go wrong is to think that data scientists exist.  I certainly think the field of data science exists but the field is simply too broad to be fulfilled by a single individual.  An effective data science team is a blend of skills including statistics.  This radar by Mango Solutions is a good example of this idea and I presented a similar concept at the RSS conference in Sheffield in 2014.

    Data Science Radar
    Mango-solutions remove preview
    Data Science Radar
    View this on Mango-solutions >


    ------------------------------
    Nigel Marriott
    Statistical Consultant
    Marriott Statistical Consulting Ltd
    ------------------------------

    Attachment(s)



  • 58.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-24-2018 15:25
    I find this discussion interesting since I was asked about the term "data science" in my statistical work in healthcare.

    Do I agree with changing the name statistics (or biostatistics) to data science (or data scientist)?  No.  I agree with those who have already answered that data science is a collaborative team working together to find out what the data are telling us using millions of pieces of data.  Yes, there are powerful software tools out in the market to sift through all that data.  However, relying on just the software is not a good idea.  Why?  The algorithms built to do the mining and analysis could be wrong from the beginning.  This is why working with a test data set is always a good idea.

    When I started learning SPSS Modeler (which is one of the data mining tools in the market), one of the learning modules mentions data science in the data mining process.  Statistics is one of those areas which is key to data mining.  Statisticians are trained to dive into their data to find out what is good or bad about it before analyzing it.  We do this on a smaller scale than data mining.

    Patricia Schlorke, DrPH
    Administrative Business Intelligence I
    Clinical Informatics Department
    Texas Health Resources





  • 59.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-25-2018 16:35

    From the health data perspective in the US, the regulator has clearly voiced its keen interest in Real World Evidence, which has already been in the law, the 21st Century Cures Act.

     

    "The term 'real world evidence' means data regarding the usage, or the potential benefits or risks, of a drug derived from sources other than randomized clinical trials."

     

    In NIH Strategic Plan for Data Science: "To advance NIH data science across the extramural and intramural research communities, the agency will hire a Chief Data Strategist."

     

    While we may wish to hold onto classical statistics only, the arising challenges from a diverse data sources will require broad skillsets. Data science talent is needed, as well as deep knowledge in statistics.

     

    Therefore, suggest considering Statistics and Data Science to be inclusive.

     

    References

     

    https://blogs.fda.gov/fdavoice/index.php/tag/real-world-evidence

     

    https://www.fda.gov/NewsEvents/Newsroom/FDAInBrief/ucm613793.htm

     

    https://www.congress.gov/114/bills/hr34/BILLS-114hr34enr.pdf

     

    https://datascience.nih.gov/strategicplan

     

    https://ww2.amstat.org/meetings/wsds/2018

     






  • 60.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-30-2018 18:39

    As the 21 Century Cures Act has come up in this connection, it seems an example of why statistics and experimental design need to be a key part of data science (quite aside from name changes). This huge bill had enough good things in it to cobble together endorsements, but it also eliminates informed consent so long as there is considered to be "minimal risk" to people. They do not explicate what this consists of or who decides. It's one thing to use our medical data without our consent, but what about failing to inform us we're in an experiment so long as someone deems the risks minimal and there are "safeguards"?"Real world data" certainly sounds more appealing than observational, uncontrolled, anecdotal, or data-dredged data.



    ------------------------------
    Deborah Mayo
    Virginia Tech
    ------------------------------



  • 61.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-23-2018 14:45

    Perhaps including both Statistics and Data Science to be more inclusive, as the ASA's Women in Statistics and Data Science (WSDS) conference series did.

     

    https://ww2.amstat.org/meetings/wsds/2018

     

     

     

     

     






  • 62.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-23-2018 14:48

    As someone

     

    ·        outside academia

    ·        who works in databases and with data all day, and

    ·        has only recently started studying statistics,

     

    my initial gut reaction to the proposal to rename the field is NOOO!!!

     

    While I agree with most of the positions laid out in the proposal, I take issue with a key piece of the argument: 

     

    "This is because the term "statistics" has four distinct meanings, as discussed in my earlier post. These four meanings give laypeople a sense of ambiguity at the outset. In contrast, the term "data science" has a single simple meaning that presents the key ideas of statistics in words that laypeople can understand"

     

    In my day to day I come across so-called 'data scientists' with experience ranging from a 3 day training course in R, to multiple PHD's.  The definitions for what they do varies much more widely than the 4 clear definitions of statistics.  I find some who define data science as the type of work I already do (cleaning and organizing data... regardless of how it was collected or it's veracity), while for others the skill sets of statistics are only small, foundational piece of the background of a data scientist (note this from Swami Chandrasekaran in 2013 http://nirvacana.com/thoughts/2013/07/08/becoming-a-data-scientist/ ).

     

    While statistics is, in my opinion, absolutely a part of data science, I feel dropping the name 'statistics' makes it more confusing.  Now it will be unknown which part of the data science universe one is in.  I think the field and it's practitioners would be best served to continue calling the field 'statistics', underneath the umbrella of data science.

     

     

     

    Jenny Mahoney
    Developer, Business Intelligence II
    303-449-6444,1239
    jennym@spectralogic.com

    FacebookGoogle+YoutubeTwitterInstagramLinkedIn

    Spectralogic

     

     






  • 63.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-23-2018 17:21
    No, we should not change the name "Statistics" to "Data Science".  The word "Statistics" is extremely broad- theoretical statistics (viz. Markovian and non-Markovian stochastic processes, queueing theory, and other probability theories, etc.), applied statistics (design of experiments, analysis of designed experiments such as ANOVA, ANCOVA, MANOVA, MANCOVA, linear, multiple linear, and nonlinear regressions, etc.), quality control analyses, etc., and biostatistics (clinical and nonclinica trials, modeling of biological processes, etc.), not all of which deal with actual experimental data.  On the other hand, the so-called "Data science" only covers part of this broad spectrum.  Just remember, Fisher, Pearson (both Karl and his son E.S.), Neyman, Kempthorne, Cox, and various others developed theories and techniques.  The techniques part probably is the one that one should categorize as "Data science".

    Ajit K. Thakur, Ph.D.
    Retired Statistician





  • 64.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-24-2018 15:27
    With all due respect to those who argue for changing the name 'statistics' to 'data science', that's a terrible idea. Here is a list of my last 3 activities today...
    1) Developed a strategy for maintaining sufficient power and type I error when enrolling an additional patient at a lower dose in a phase I clinical trial of dendritic cell-based immunosuppression
    2) Discussed strategies for more efficient analysis approaches for evaluating treatment effects on liver fibrosis in with mediating effects of alcohol use
    3) Recommended strategies for modeling effects of highly correlated cardiovascular risk factors and other biomarkers on imaging measures in rheumatoid arthritis patients

    All of these discussion and the subsequent analyses are clearly statistics and outside of most definitions of data science, but are incredibly valuable to the scientific fields.
    And I totally disagree that data science is somehow a more clear term to the public. Perhaps we are just not very good at selling the importance of statistics... A name change is not the solution to that problem.

    I also reject the idea that jobs in statistics are more sparse. To the contrary... there are many more jobs than there are going candidates. Hiring staff in my lab is always a struggle.

    Keep in mind, I am completely on board with promoting data science... In fact, when I started a position in biomedical informatics, I named my lab the Biomedical Statistics & Data Science Lab in recognition of the importance of both of those areas. But let's not throw everything in with data science. There's no reason to do so.

    ------------------------------
    Douglas Landsittel
    Professor of Biomedical Informatics
    University of Pittsburgh
    ------------------------------



  • 65.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-24-2018 20:37
    I notice that Mr Mcnaughton in his recent reply was very selective and Biased in his reply.

    He starts his response with [My italics] "another group of thoughtful  statisticians who see benefits of the change." Sort of implying that the other stats guys were not thoughtful. He continues: "Here are some key points:" Which he follows with key points for the change and none the "key points" against the change.

    I'm a long retired stats guy and really have no skin in the game; however, if you are going to try to convince people of your point of view, at the very least you should honestly represent both sides. Unfortunately, Mr Mcnaughton did not do this.

    I could go through and refute many of his claims, but the interested reader could do the same by reading through all 63 (as of today) responses. I will not do that; but, I will respond to his first point.

    "The change is sensible because although our specific methods often differ, we statisticians and data scientists do the same thing."

    Well, that's like saying that a gopher does the same thing as a farmer because they both dig in the dirt.

    Data Scientists (if you believe the PR) do analytics, programming, and data base development and are experts in all three fields. I suspect many stats guys have done all of those things; but, certainly that is not what you think of when you hear the name statistician. Most stats guys, at least those in applied stats will claim to be experts in analysis. But it would be fanciful and outright false to say we are experts (or even necessarily competent) in those other two fields.

    If a Data Scientist wants to "claim" they are experts in those three fields I would expect them to prove it before hiring them. Certainly, a "good" data scientist should have a reasonable amount of knowledge about all three, but they will not be experts. So, if you want someone to do true in-depth analytics, hire a stats guy; if you want a good programmer or system designer, hire an IT guy; and if you want someone to build a sophisticated effective data base, hire a data guy.

    In summary, if you want a jack-of-all trades (but not necessarily a master of any) hire a data scientist. But if your needs are bit more sophisticated, hire a specialist.

    Certainly, I mean no offense to Data Scientists; but, they are not stats guys.


    ------------------------------
    Michael Mout
    MIKS
    ------------------------------



  • 66.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-25-2018 15:45

    Having read through many of the arguments both pro and con for changing the name of our field from "statistics" to "data science", I find the reasoning and logic behind keeping the names separate to be the most compelling.

     

    What I think is needed is a greater understanding of how these two fields interact, overlap and differ. I think many statisticians and many data scientists do not have a firm understandings of how the two fields relate and what skill sets for each are the same and what skill sets are different. And if that is true within our fields, think how much less understanding there is in the general public about these two fields.

     

    Just my $0.02.

     

    Best Regards,

    Bruce White

     

    Statistician

    (651) 795-6534

    Computare in aeternum

    Bruce.white@ecolab.com

     

    CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain proprietary and privileged information for the use of the designated recipients named above. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.





  • 67.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-28-2018 14:37
    If the word "Science" is to be included, it seems imperative that we must avoid using innuendo in the arguments for or against (as in Mr. Mcnaughton's "thoughtful statisticians" response).

    Right now I'm not in favor of changing the name but I would like to hear more about the potential advantages of a name change.  (An argument that funding for statistical research would be improved is not a satisfying reason although it would become a practical reason if funding would otherwise be nearly non-existent.)

    I also wonder if Economists have a similar issue.  Should they be calling themselves "Financial Scientists" ?




    ------------------------------
    Jim Baldwin
    Retired
    ------------------------------



  • 68.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-24-2018 15:31
    It seems that a huge problem is that most statistics courses, even many masters courses, rarely deal with real world data (e.g. highly dimensional and messy) and rarely ask relevant real world questions (e.g. prediction).

    We're still teaching methods for small, idealized datasets which were based on methods developed for simple  2 or 3-dimensional agricultural statistics.

    It's quite possible to do a MS in conventional statistics without being an amazing computer programmer these days -- that's just wrong as real world data requires creativity to solve most real problems -- and standard packages like SAS and SPSS do not allow for such creativity.  You have to know how to think & how to code, not just how to use 50-year-old methods built for 2D & 3D problems.

    It seems that the core statistics curriculum, while extremely valuable, just isn't that relevant to what most people need on the job.  There is an opportunity cost to spending a whole lecture teaching a student how to derive the Rao-Blackwell Theorem (when was the last time you used that?).  

    I agree with a previous poster who mentioned how many of the Masters Data Science degrees are cash / money making degrees for stats departments.  But they're also (presumably) providing skills that employers with large, messy datasets desire / require.

    Creating "Statistics & Data Science" departments is probably wise.  I didn't used like the term Data Science -- at first I thought people were using that just because "Statistics" is boring and "Data Science" was just marketing, but I now appreciate that Data Scientists tend to be more open minded than many statisticians -- as scientists should be.

    I think if we as statisticians hold onto our old methods & old ways we're going to be left in the dust by creative programmers who might not know the theory, but can provide more relevant and robust answers to today's messy problems.





    ------------------------------
    Jason Connor
    ConfluenceStat, LLC
    ------------------------------



  • 69.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-25-2018 16:37

    Jason:

    I believe many people have addressed this, however I will once again state, I think your experience with statisticians teaching only benign or contrived theory that only works in a perfect world was an artifact of your program.  In my classes and as a research assistant, I have had my hands on 'messy' data.  Many of our department meetings are not only around the theory of how to deal with these things, but also how to write programs or use existing ones to make the analysis part of the project work efficiently.

    Bear in mind, the reason to start off with a theory that deals with "perfect" settings, is to eventually get to more realistic models and techniques that deal with data as you might encounter it.  Both facets of statistics are covered in most programs and are developed both theoretically and with application to real data. 

    I think your (and some other's) assertions that statisticians are set in their ways, are not open-minded, don't care about programming, etc is simply incorrect.  I am sure you can find people like that, however my experience in my program, talking with other statisticians, and at conferences has been that people are extremely creative (at JSM we have computational and data science sections - so if we are not open-minded, how did those come about?), tell you to learn statistical packages and go back to programming languages, and are constantly evolving.  So, your thinking that you have to be able to code/program is absolutely spot on and shared by most statisticians.  My phd advisor right now is constantly telling me to get better at programming and encouraging it.

    I think the one fatal thing you said and is a problem with data scientists without solid theoretical training is captured in this statement: "....creative programmers who might not know the theory, but can provide more relevant and robust answers to today's messy problems."

    Without knowing theory, which you seem to abscond as fruitless or a waste of time, how will they know they have correctly "modeled" their data process to completely understand it is in fact robust?  This is a case of simply not knowing what you don't know.   The way in which your theory/model breaks will be a way (or most likely multiple) that is never contemplated. The reason theory and modeling in the problem in the most general sense is that you can then understand how nuanced it might be and where it may break down.  Then after it is presented, people will examine it and also provide more insight. 

    As many noted previously, data science and statistics have cross-over.  However, since data science hasn't been as well established, knowing where this line is will probably take some time.  I think the best course of actions is continuing to work together at conferences, on collaborations, and keeping the lines of communication open so when or if there is a more natural separation, we can clearly articulate it.    As of today, as many have noted, data science is altogether too broad and doesn't necessarily confer the absolute skill set someone has.  And as Michael Mout points out, I am not going to over-promise my skills.  So if someone wants to say they are an expert at Stats, programming, and database development/infrastructure, have at it.  I agree that the best approach is to probably have experts in each area working in small teams to get optimal results.



    ------------------------------
    Michael Machiorlatti
    Phd candidate - University of Oklahoma Health Sciences Center
    ------------------------------



  • 70.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-27-2018 10:47
    That's great to hear Michael

    And for clarity, my program was fantastic and stressed programming, creativity, and real-world data -- we had requirements like that for both the MS and PhD.

    My bias is that I've found that others often times just want to use what's in the tool box, which is a pretty limited set of tools.  Furthermore that the standard stat toolbox isn't very good for multidimensional problems.

    I'm glad to hear your program is different -- and hopefully most graduate programs are these days.  That way new statisticians will have been trained in these techniques.  I think one issue is many statisticians in practice never had such an opportunity (in part because many methods didn't exist during their/our training).

    Cheers







  • 71.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-25-2018 17:12
    YES YES YES YES YES YES YES!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    If you look at most traditional stats programs, they require students to take mathematical and statistical theory courses. Taking courses outside the department is often frowned upon. When programs require "cognate" courses, most of those courses are on how those other departments look at data analysis. With many of the Data Science programs I've seen, one needs to have a base of knowledge outside the stats or comp sci department. They even have project requirements where they are given real world data and told to analyze it. Meanwhile, after 16 stats classes, I don't remember analyzing a data set that had more than 100 or so tuples and everything came from a textbook web site.  

    I wouldn't agree that many Data Science programs are money grabs. I'm sure some are. If you have a Data Science program that is housed outside the stats department, say an engineering department, they tend to have some outside consulting and advisory groups made of corporate leaders that see a need for certain types of knowledge in their new hires. For example, the advisory board for the MS Data Science program at U of Mich Dearborn has Senior managers, VPs and CTO from several local fortune 500 companies. So, they asked employers, "What do you want a new hire to know?" They work WITH companies to provide real world experience and data sets. Most of the programs require some sort of project with a local company, unless they already work in the field. 

    I don't see those advisory boards for most of the local (to me) stats and mathematics programs.  I don't see any local stats programs deciding a 3-4 credit capstone project is more important than another stats class.

    On top of that, if you end up having the engineering housed programs using ABET accredited courses and curricula. I don't see any ASA accredited stats programs.  




    One of the biggest differences between traditional statistics and a lot of the "data science" methods are statisticians rely on looking at plots of data. They feel a well trained eye is better than any mathematical computation one can make. With several of the data sets we used in a survival analysis class I took, the "goodness of fit" tests all showed that the results of different levels of an accelerated life tests come from different distributions than the one the author told us to assume. The plots with minimal amounts of data "looked ok" according to the professor with 40+ years of experience in reliability analysis. So, according to the stats prof, "Don't trust math."

    When I used the same data set in a different class through an engineering department, we fit more appropriate distributions to the data. The engineering professor told us about how the traditional method of assuming a single distribution fits all the data has held back quality improvements and reliability estimations. We were told to, "Trust the data and use the appropriate theory." 



    For the statistician that thinks human experience is a useful tool and "unbiased", ask a prosecutor about the quality of "eye witness" testimony. Talk to a defense attorney and psychologists about the "selective memory" of detectives when they get a "confession" from a suspect. How the tools used by those detectives bias what the suspects say. Just think about all the "Project Innocence" programs throughout the USA. 

    If you want an example of how humans miss something that is right in front of them, listen to Pink Floyd's Another Brick in the Wall Part 2. What do the children say at the end of the song? 


    There is also an idea that when a Data Science model doesn't work, its proof that Data Science is bad and wrong. When traditional stats models fail, statisticians chalk it up to, "There's always a chance of being wrong." 

    Nature did a survey of scientists and asked them about the reproducibility of studies they read about and studies they performed. According to the survey, over 50% of the medical research couldn't be reproduced. Given the number of biostatisticians that work on medical research, do we really want to say the statistical analysis was flawed, there was a poorly designed experiment, etc? If so, what were the biostatisticians doing?   

    https://www.nature.com/news/1-500-scientists-lift-the-lid-on-reproducibility-1.19970


    Data Science is flawed. So is Statistics. If we use all the tools that are available to us, we will all be better.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 72.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-27-2018 08:58

    What were the biostatisticians doing on studies that were not reproducible? 

     

    Studies almost never have only one test of significance.   Four strategies for multiple tests of significance are:

    1. Use p < .05 for every test of significance.
    2. Use p < .05 for every test of significance, but note lack of adjustment for multiple tests of significance as a limitation of the study.
    3. Partition the study into families of hypotheses, and test each family using some procedure for multiple tests of significance that controls its family-wise error rate to p ≤ .05.
    4. Control the experiment-wise error rate to p ≤ .05.  (I doubt any biostatistician would propose this, although it could be useful-post hoc-to respond to objection to multiple family-wise tests.)

    I doubt that the biostatistician would make the final decision on what strategy to use in the paper if (1) and (2) provide "significant" results, while (3) and (4) do not.  This decision may depend on the perception of the journal as prioritizing statistical significance of results or statistical rigor of methods.  A 1.5 strategy would be to submit the paper using (1), planning to revise to (2) if a reviewer protests.







  • 73.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-24-2018 16:30
    "Statistical data science", anyone?

    ------------------------------
    Piotr Fryzlewicz
    London School of Economics
    ------------------------------



  • 74.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-25-2018 15:27
    Simply put - NO!

    Looks like when many (but not all) data scientists have not heard about Neyman-Pearson Lemma. Point Estimation? Forget about it. 

    And we have not yet started about Mathematical Statistics.


    ------------------------------
    Girish Ganesan
    ------------------------------



  • 75.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-26-2018 15:08
    I agree with Girish Ganesan.
    Not only are many "data scientists" not familiar with statistical concepts on a "graduate school" level, but also many theoretical statisticians, and probablists who may be very good statisticians, do not touch data.

    --
    David R. Bristol, PhD
    President, Statistical Consulting Services, Inc.
    1-336-293-7771





  • 76.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-28-2018 08:33

    It seems like lack of math comprehension is an issue in many fields.  Are the math prerequisites for data science degree programs any different than those for applied statistics?






  • 77.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-31-2018 11:28
    What makes a data science degree? Does a business analytics degree count? What about a Comp Sci degree? 

    For example, the MS Data science degree at U of M - Dearborn requires Calc 1, Calc 2, Linear Alg and Calc based Prob and Stat. Calc 3 is suggested.

    For the MS Comp Sci degree, where you can focus in data science, Calc 1, Calc 2, Calc 3, Linear Alg, Discrete Math and Prob and Stat.

    For the MS Business Analytics degree, Business Calc and Business Stats. 


    I've seen psychologists with no math beyond pre-calc use partial diff eqns to model signals in the human brain. They use fMRI data to test their models too. That's definitely data science. But, they don't have a degree in "data science".

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 78.  RE: Should we change the name of the field of statistics to "data science"?

    Posted 07-29-2018 06:05

    Dear colleagues,

    what is now called the field of data science comprises many skills.
    In its 2013 field guide on the Management of Big Data projects [http://www.bitkom.org/de/publikationen/38337_76511.aspx], the BitKom predicted 4 new professional profiles that would evolve in the wake of the Big Data economy. The report called these the data innovator, the data scientist, the Big Data developer and data artist. The field guide required statistical skills among others for the data scientist.

    We don't talk anymore much about big data (unless we are in a backwater sales department), since what big data is, may well depend on your technological abilities.
    The discussion on the same topics has moved on to the term data science.
    However, some other authors prefer (or at least preferred for a time) the term 'analytics' - as in people analytics, web analytics, business analytics. It seems to me, the term is replacable.
    Merriam-Webster defines statistics as 'a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data'. Also, statistics typically uses the tools of probability theory and stochastics to that end.
    With respect to data, it is an important asset required in analysing, understanding and presenting data.
    Dealing with real world data requires (and required) many more skills and strongly involves cross-functional teams. Many of the required skills are non-mathematical.
    In a contemporary context, you need proficiency in usability, psychology, graphic design, the domain your data comes from, the technology & software-tools you are applying, understanding the needs of your audience/stakeholders and  so much more - depending on the project.
    Any term that tries to comprise so many different trades will remain hazy.

    To sum it up: 'Data science' as a concept is fuzzy without a given context and may require many different skills.
    You might as well call it 'timey-whimey-wibbly-wobbly-theory-of-every-data-thingy'.
    Statistics is one of these skills, but one needed in most (if not all) data contexts. For this reason, it is easy to mix up the two.
    Statistics focusses on mathematical approaches to data, typically applying probability theory and stochastics.
    Defining 'data science' and statistics to be the same thing is not just inadequate. It is simply and plainly wrong.

    Best regards,
    Christian




    ------------------------------
    Christian Graf
    Dipl.-Math.
    Qualitaetssicherung & Statistik

    "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."

    Ronald Fisher in 'Presidential Address by Professor R. A. Fisher, Sc.D., F.R.S. Sankhyā: The Indian Journal of Statistics (1933-1960), Vol. 4, No. 1 (1938), pp. 14-17'
    ------------------------------