This piece is co-authored with Ron Wasserstein, Executive Director for the American Statistical Association (ASA).
The White House Office of Science and Technology Policy is hosting a “Big Data event” tomorrow, March 29, at AAAS and will feature representatives from NSF, NIH, NIST, DOD, DARPA and DOE. Other than the speakers, there is little available about what will be announced. We’ll keep you informed as information becomes available. In the meantime, we want your input.
There is no question Big Data has hit the business, government and scientific sectors. Unfortunately, the role of statistics seems too often to be undervalued. Instead, computer science, applied math or other fields are frequently mentioned as the pertinent scientific discipline while statistics is often left out.
The ASA is exploring how it can ensure statistics fulfills its potential in addressing the big data challenges in the industry, government, and science sectors. We’d like your input. Please use the comment space below to give us your ideas for the role of statisticians and the ASA in Big Data (or post your own blog entry).
Some ASA members have already expressed their opinion in Amstat News.
In a September 2010 article, “Statistics Ready for a Revolution,”
Mark van der Laan and Sherri Rose say the next generation of statisticians must build tools for massive data sets. Their article begins: “The statistics profession has reached a tipping point. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready for a revolution, one driven by clear, objective benchmarks by which tools can be evaluated. The new generation of statisticians must be ready to take on this challenge.”
ASA Presidents have also been active. Over the past year, ASA President Bob Rodriguez has been giving numerous talks on big data and statistics. His keynote address for the inaugural ASA Conference on Statistical Practice featured the big data challenges for the business community. Earlier this month he gave a presentation at Arizona State University titled, “Business Analytics and Big Data: Is the Statistics Profession Ready?”
In her March 16 AAAS article, “Cutting Edge: Emerging trends in biostatistics,”
2013 President Marie Davidian provides examples of the big data challenges for biomedical and biological science and the opportunities for biostatisticians “to collaborate with the scientists generating the data to develop innovative new theory and methods to tackle problems never envisioned by the biostatisticians of yesterday.”
2010 President Sastry Pantula repeatedly challenged (and continues to do so) the statistical community to play a central role in the data tsunami. In his April 2010 Amstat News
column, “Be a Proud Statistician
,” he writes, “Data warehousing, retrieving, and mining important information out of the large data sets pose many challenges for the future,” and then poses the question, “Are we training newer statisticians with appropriate analytical, computational, and communication skills as well as new measurement theory and applications?”
As the lead coordinator of this year’s Math Awareness Month, the ASA promoted this theme, “Mathematics, Statistics, and the Data Deluge.” Please visit that website and follow it on Twitter: @MathAware.
Financial services, retail, internet searches, and social media are among the largest big-data drivers in the private sector. One bright spot from the business community was last year’s McKinsey report, “Big data: The next frontier for innovation, competition, and productivity,” which discusses the role of statistics prominently. The following quote appears in the executive summary:
A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data …. we project that demand for deep analytical positions in a big data world could exceed the supply being produced on current trends by 140,000 to 190,000 positions (Exhibit 4).
In the scientific community, Nature and Science magazines have both dedicated issues to data: http://www.nature.com/news/specials/bigdata/index.html and http://www.sciencemag.org/site/special/data/ and both neglected to mention statistics. On the other hand, the following quote from
Robert Tibshirani in the January 26 New York Times “Bits” piece, “What Are the Odds That Stats Would Be This Popular?”, indicates statisticians are playing a central role:
“Most of my life I went to parties and heard a little groan when people heard what I did. Now they’re all excited to meet me.”
The Big Data era also has implications for the federal statistical system as noted in this year’s Economic Report of the President:
The growing integration of technology in our daily lives has created an abundance of new possibilities for producing better and more timely data based on nontraditional sources of information. As Census Bureau Director Robert Groves has written, “(t)he volume of data generated outside the government statistical systems is increasing much faster than the volume of data collected by the statistical systems; almost all of these data are digitized in electronic files” (Groves 2012). Nontraditional sources of information include both digital administrative data (e.g., tax records and records related to participation in government transfer programs) and records generated in the private sector (e.g., data from Internet searches, scanner data and social media data).
Post your comments below or as a blog entry on the role of statisticians in Big Data and what the ASA can be doing for statistics to fulfill its potential to help solve the Big Data challenges. We’d also welcome your favorite articles (and/or quotes) on the topic. Watch also for Tweets on the topic: @Ron_Wasserstein; @ASA_SciPol; @AmstatNews; and @MathAware. You can also email comments to Ron Wasserstein
and Steve Pierson