ASA Connect

 View Only
Expand all | Collapse all

"Big data is dead. Data is just data."

  • 1.  "Big data is dead. Data is just data."

    Posted 12-01-2017 10:01

    This essay by Alexander Thamm provides a perspective I think most statisticians can align with:

    https://www.linkedin.com/pulse/big-data-dead-just-regardless-quantity-structure-speed-thamm

     

     

    John A. Major, ASA, MAAA

    Director of Actuarial Research, Guy Carpenter Analytics

    1166 Avenue of the Americas

    New York, NY 10036

    860-839-9148 (cell)

    https://scholar.google.com/citations?user=9b7B9C4AAAAJ&hl=en

     




    This message is intended only for the use of the addressee and may contain
    information that is PRIVILEGED AND CONFIDENTIAL.

    If you are not the intended recipient, you are hereby notified that any
    dissemination of this communication is strictly prohibited. If you have
    received this communication in error, please erase all copies of the message
    and its attachments and notify the sender immediately. Thank you.


  • 2.  RE: "Big data is dead. Data is just data."

    Posted 12-04-2017 08:54
    Edited by David Corliss 12-05-2017 02:35
    Many important insights here! While many of us have long been cautious of buzzwords like "Big Data", this article details when more data is needed, when it isn't<g class="gr_ gr_45 gr-alert gr_tiny gr_gramm gr_inline_cards gr_run_anim Punctuation replaceWithoutSep" id="45" data-gr-id="45">,</g> and underscores the practices we need to be pursuing. It's good to see people realizing that putting more records in the model just for the sake of more records doesn't automatically make the model better. Maybe it helps, maybe not, maybe it doesn't help enough to be worth the additional effort and resulting delay in taking action on the results. Careful design of experiments, selection of the right modeling methods, and scientific sampling will continue to have a significant impact. Relying on brute force alone, unaided by careful science, will give us brutish models.

    ------------------------------
    David J Corliss, PhD
    Analytics Architect / Predictive Analytics
    Ford Motor Company
    ------------------------------



  • 3.  RE: "Big data is dead. Data is just data."

    Posted 12-05-2017 15:59
    I have been fighting against this term and mind set for many years.

    In the credit and insurance community, large data sets have been around for almost 30 years for building  risk and marketing models. Typical data sets can easily be 10,000 to millions of observations. And in many of these analyses the significance of variables are .0001 or higher for inclusion in a model. Also many of these additional variables at that level of significance often add little or no significant accuracy to the model without adding complexity and even then the added accuracy is minimal and adds other issues like confusing reason codes.

    The point is that more data often adds nothing. In fact, better samples often make better difference than larger samples. For example, making sure that the sample covers an appropriate time frame or appropriate geographic samples.

    Big data never was and never will be the best solution for good statistical analysis.

    Michael L. Mout, MS, Cstat, Csci
    MIKS & Assoc. - Senior Consultant/Owner
    4957 Gray Goose Ln, Ladson, SC 29456
    804-314-5147(Mbl), 843-871-3039 (Home)





  • 4.  RE: "Big data is dead. Data is just data."

    Posted 12-07-2017 13:33
    Is it possible that "Big Data" was replaced with "Machine Learning" and "Data Science" because those are the types of skills needed to analyze "Big Data"? 

    For anyone that thinks data is just data, try working at Ford on autonomous cars or at U of Michigan health system taking signals from medical equipment. Both require taking live data from multiple streams of sensor arrays or arrays of sensors, and determining, "Is this data good?, If so, then what?" Get it right, no one knows or cares and things get better. Get it wrong, people die. 

    Having worked with this type of data, it's not uncommon for each sensor to send 100Hz to 1,000Hz of signal data times X number of sensors times Y hours.... I think that gets pretty big. Thankfully, there is Machine Learning.

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 5.  RE: "Big data is dead. Data is just data."

    Posted 12-08-2017 08:28

    I hope so.  I am growing weary of the phrase.  Besides, as you point out, it is not a new concept to have very large datasets.  I won't miss it.

     

    Susan E. Spruill

    Susan E. Spruill, PStat®

    Statistical Consultant, President

    Applied Statistics and Consulting

    828-467-9184 (phone)

    Professional Statistician accredited by the American Statistical Association

    www.appstatsconsulting.com

     






  • 6.  RE: "Big data is dead. Data is just data."

    Posted 12-08-2017 09:36





  • 7.  RE: "Big data is dead. Data is just data."

    Posted 12-08-2017 10:31
    Well for the sake of the weekend, there is an old joke among astrophysicists and the rate of paper publication on dark matter, string theory or whatever topic you want to mock. It can easily be applied to Big Data. And it includes - as most jokes do - an important truth:

    "Imagine all new data created in all connected networks (aka internet) as a growing pile of paper. The velocity of its growth has already exceeded c - or at least will do so soon enough.

    Why is this not a contradiction to the theory of relativity?

    - Simple: The information added with each new dataset is zero."

    Best regards and a creative (or lazy - whicherver you prefer) weekend,
    Christian


    PS.: The challenge in 'Big Data' is often to pick the information from a huge pile of mostly redundant or unimportant stuff and not to fall for the noise. And yes - 'machine learning' helps a great deal with that. However, when I was still a student, we simply called it 'numerics' or 'applied mathematics' ;-).

    ------------------------------
    Christian Graf
    Dipl.-Math.
    Qualitaetssicherung & Statistik

    "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of."

    Ronald Fisher in 'Presidential Address by Professor R. A. Fisher, Sc.D., F.R.S. Sankhyā: The Indian Journal of Statistics (1933-1960), Vol. 4, No. 1 (1938), pp. 14-17'
    ------------------------------



  • 8.  RE: "Big data is dead. Data is just data."

    Posted 12-11-2017 17:10
    What would they say about a method that does not even bother to gather new data but rather resamples data you already have? How redundant would that method be? 

    If given the option of resampling the old or gathering the new, which is better?

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------



  • 9.  RE: "Big data is dead. Data is just data."

    Posted 12-08-2017 10:21

    Dear All,

     

    Although "Big Data" may have various V's and interpretations, the term "Real World Evidence" has been defined in the U.S. 21st Century Cures Act.

     

    Please see my recent interview in AmStat News and a podcast with the ASA BIOP Section.

     

    http://magazine.amstat.org/blog/2017/10/01/kelly-h-zou-on-real-world-evidence   

     

    http://community.amstat.org/biop/podcast  

     

    Happy Holidays!

     

    Kelly

    Chair, ASA SPAIG Committee

    Chair-Elect, ASA HPSS Section






  • 10.  RE: "Big data is dead. Data is just data."

    Posted 12-11-2017 09:14
    It has been fun to read this material and three thoughts occurred to me that may be helpful:
    1. John Tukey in the early 60's defined big data as "a data set too big to fit on one device". The idea was that when you
    needed multiple devices you needed much more complicated procedures to analyze them. I think his definition still holds. Of course, when Tukey waas writing he was talking about tape drives and now they are peta-byte devices  (and more), but the same idea is useful.
    2. What is statistics if not a set of procedures to tell a lot about a large data-set from smaller samples.Since the precision of our measures goes as the square root of the sample we have, there is a diminishing return.
    3. And what about continuous streams of data? Yes such streams, if digitized, contain immense numbers of data points, but such a digitization only approximates the continuous nature of the stream. Why not treat the stream as a single continuous function and use the tools of functi0nal data analysis pioneered by Jim Ramsay and his colleagues ?

    Just my 2¢ on a Monday morning.

    ------------------------------
    Howard Wainer
    Extinguished Research Scientist
    ------------------------------