My earlier post (2017) recommended that we change the name. There were around 75 responses to the post and to related posts in this thread (in ASA Connect and anzstat). Most of the responders who expressed opinions were against the change. However, there is another group of thoughtful statisticians who see benefits of the change. Here are some key points:
First, we must carefully consider whether the change would be disrespectful of current data scientists. I believe there can and should be no disrespect, only good intentions for a sensible change.
The change is sensible because although our specific methods often differ, we statisticians and data scientists do the same thing -- we invent, refine, and use general methods to draw reasonable conclusions from data that are obtained in scientific (empirical) research. Since both fields do the same thing, therefore, to avoid confusion, they should both have the same name. This gives the overall field synergy as the two subfields combine, agree about, and disagree about their various approaches.
The name "data science" for the overall field is best because it describes both fields best, sounding more interesting and more practical to laypeople than the puzzling (to them) term "statistics".
If we show proper respect for current data scientists, and if we rename the field of statistics as "data science", and if we convincingly explain why studying data is useful, then we will help laypeople to understand the field. This is important because most laypeople don't know what we do. (The field of statistics may be the most misunderstood of all modern scientific fields.) Thus a key reason for changing the name is that this would help laypeople to understand our field.
Consider a second important reason for changing the name: In view of the large grants being awarded to data science, Terry Speed observed that the field of statistics "might miss out on the millions being lavished on data science right now" (2014, p. 3). Then Terry seemed to shrug. He said, "Let's wait for 10 years and see who is still talking about Big Data and data science" (p. 3). Thus Terry believed when he wrote his article that data science was a passing fad.
Here, we must distinguish between the
name "data science" and the
methods of data science. As with any field, some of the methods of data science will lose favor as other more effective methods are brought forward. But the
name will prevail because (arguably) it perfectly describes at the highest level what statisticians and data scientists do.
Perhaps to stimulate debate, Terry wrote:
- Did any species ever avoid extinction by adopting a new name? No, they adapted, they evolved, and so must we (2014, p. 3).
But Terry's argument by analogy is invalid because there is no relationship between the name of a species and its survival or extinction. (Nor do species
ever themselves adopt new names.) In contrast, there
is a relationship between the name of the field of statistics and its prevalence. This is because the term "statistics" has four distinct meanings, as discussed in my earlier post. These four meanings give laypeople a sense of ambiguity at the outset. In contrast, the term "data science" has a single simple meaning that presents the key ideas of statistics in words that laypeople can understand.
Thus, if we change the name, and if we give a good explanation of why we are doing this, we will help laypeople to understand and respect the usefulness of the field of statistics. This will unlock resources. Thus a second reason for changing the name is that this would unlock more deserved resources for the field, including financial resources.
It goes almost without saying that if we change the name, we will merely be performing a sensible public-relations exercise -- we won't be changing the
function of the field of statistics.
Let us avoid the divisive question of whether the present field of statistics is somehow superior to the present field of data science because this question is immature and irrelevant. Let us also avoid the question of what exactly is a statistician and what exactly is a data scientist because that appears to be emphasizing artificial boundaries and is thus an academic form of racial discrimination. We are all brothers and sisters. And the data-science arena is big enough for all who wish to understand data, from the highest to the lowest levels of knowledge, competence, and specialization.
If (by changing the name) we statisticians bring our ideas into the arena, then we can help those with gaps in their knowledge to close the gaps. And, by exploring the arena, we may close some gaps in our own knowledge.
Importantly, if we change the name with tasteful eye-catching fanfare, this will generate substantial public interest. And if we make the change carefully in one coordinated swoop, this will generate respect for the field for its unified effort to clarify its function.
Here is a graph of the count of the number of activities using the phrase "data science" in the online program of the Joint Statistical Meetings (JSM) from 2008 to 2018:
The graph and the current tenor of statistical discussions about data science suggest that we could do nothing, and the name of the field
might change to "data science" on its own. But letting the change happen at its own speed won't have the same positive public-relations impact as changing the name in one swoop. In view of the broad misunderstanding of the field of statistics, changing the name in one swoop together with a carefully mounted public-relations campaign to (a) explain the change and (b) explain the role of the field is highly sensible.
But could we obtain the necessary cooperation among the many statistical organizations, university and college statistics departments, statistical business units, and statistics publications to carry out a coordinated change? Arguably, yes. Almost all statisticians believe in the vital social good of the field of statistics. And we believe it is useful to promote the field in the best possible way. If that is true, and if enough statisticians agree that the change would be useful, then we can change the name in one swoop.
Enhancing the understandability of the field of statistics is socially important because the field is ubiquitous, with an active or potential role in almost all scientific (empirical) research. Therefore, if we carefully explain why we are changing the name, we could get financial support from granting organizations to help with the cost of the change.
But
how should we explain the role of statistics or data science in scientific research to laypeople? What
is the role? A sensible explanatory principle is that data science (statistics) is mainly a set of techniques to discover relationships between variables in sample data. If we perform this discovery properly, it enables us to predict and control the values of variables in new entities from populations. (The study of univariate distributions in data is a mathematically degenerate case.) We could discuss
practical examples from different fields (e.g., science, engineering, technology, business, social services, sports) to illustrate how knowledge of relationships between variables is useful for prediction and control. And we could explain (in lay terms, no math) how data science (statistics) helps researchers to obtain knowledge of relationships between variables from data.
To implement the change, statistical societies could jointly appoint a committee of senior representatives of the various groups with an interest in the change. The committee could compile the wishes of all the involved groups and then plan a strategy. They could specify a timeline and a unified set of information releases and events, with carefully written material to help laypeople to understand. The committee might hire a public relations firm with understanding of the layperson's perspective to advise the committee and to help with stylistic aspects of the implementation. Arguably, the success of the campaign would be mainly determined by (a) the statistical quality and understandability of the material its writing team produces and (b) its ability to meet the requirements of its stakeholders, including laypeople, statisticians, and current data scientists.
We could do this. If we show proper respect for current data scientists, then our work with data gives us the right to say that we, too, are data scientists. And we can bring (with proper humility and proper pride) our powerful statistical methods into the data-science arena. And we can make the field of statistics easier for laypeople to understand and respect.
Excerpts from Jon Wellner's 2017 Institute of Mathematical Statistics (IMS) presidential address, with discussion about data science, are linked below.
A recent article by Sofia Olhede and Patrick Wolfe, co-coordinators of the IMS Data Science Group, is linked below.
Donald B. Macnaughton
ReferencesMacnaughton, D. B. (2017), "Should We Change the Name of the Field of Statistics to 'Data Science'?"
https://matstat.com/macnaughton2017a.pdf September 12, 2017
Olhede, S. and Wolfe, P. (2017), "Quo Vadis Data Science?",
IMS Bulletin, 46 (8), 10-11.
http://bulletin.imstat.org/wp-content/uploads/Bulletin46_8.pdfSpeed, T. (2014), "Trilobites and Us",
AmStat News, January 1, 2014.
http://magazine.amstat.org/blog/2014/01/01/trilobites-and-us/ [This link was broken at the time of this posting.]
Wellner, J. A. (2017), "Teaching Statistics in the Age of Data Science (excerpts from 2017 IMS Presidential Address)",
IMS Bulletin, 46 (7), 6-11.
http://bulletin.imstat.org/wp-content/uploads/Bulletin46_7.pdf
NotesThe counts on the graph were obtained through an Abstract Keyword Search for an "Exact match" of the phrase "data science" (without the quotation marks) in the online program on the American Statistical Association website for the Joint Statistical Meetings for each of the years. These counts appear at the top of the Results page of a search. These are counts of individual presentations, sessions, committee meetings, organized luncheons, courses, etc., whose title, keywords, or abstracts used the phrase. The counts were obtained on June 13, 2018.
This post was sent to the American Statistical Association Community email list, the anzstat email list, the Statistical Society of Canada email list (d-ssc), and was announced in Allstat, all on or shortly after July 22, 2018.
Original Message:
Sent: 09-12-2017 20:14
From: Donald Macnaughton
Subject: Should we change the name of the field of statistics to "data science"?
Dear Colleagues,
Attached is a short essay about changing the name of the field of statistics to "data science". The essay discusses significant advantages of changing the name and suggests that the disadvantages are minimal.
Best regards,
Donald B. Macnaughton
MatStat Research Consulting
donmac@matstat.com
------------------------------
Donald Macnaughton
MatStat Research Consulting
donmac@matstat.com
------------------------------