New Issue of the magazine Impact( it is a magazine of the Operational Research Society) include an article of Geoff Royston entitled Small Data. Geoff claims that in analytical and management circles there is much talk about Big Data nowadays. As he explains, the landscape of digital world features vast ranges of data mountains thrown up by business transactions,public services and social communication. Computers and analytical techniques allow these to be mixed, matched and mined rapidly and extensively; in order to find trends,patterns and connections in such areas as: costumer purchases, population health or even popular culture.
But are more data always an answer?
To awnswer the above question let's look at the story:
Chief economist of the bank of England,Andy Haldane, has recently reported that 'Michael Fish' moment' of failing to predict the bank crash of 2008 highlighted a crisis in economic forecasting and that only big data could bring about a transformation of economic forecasting in the same way it has in improving forecasting the weather. But could it? The answer is No. Because it was not a data problem. Cnsequently, what was needed to gauge the risk of financial storm was better understanding of some basic statistical concepts and a proper,realistic financial model. And , as in any financial bubble, there were behavioral factors at play too,which always make economic forecasting even more uncertain business than predicting the weather. Big data, valuable though as it undoubtedly is,will not be enough.
It is a shame, but I do not have time to summarize the whole of the article, as much as I would love to.
But here are main points made by Geoff:
The story of big data is an inversion of the story of statistics. It is so because, a key concept in statistics is that it is not necessary to measure all of a large population in order to establish its key features - a sample should generally suffice.
Some work has been devoted in recent years on how to make the best use of very small and cost-effective ,samples.
The advent of big data has sometimes been take to indicate that, as huge volumes of dataof all varieties are can now be so easily and cheaply collected - small data, maybe even disciple of statistics- are no longer important. But there are a lot of small data problems that occur in big data:
- irrelevance( much big data is passively found,whereas small data is actively sought)
-errors( of collection or recording)
-noise( finding a needle in a haystack)
-sampling bias(another thing to do with 'found' nature of much big data;even if your data comes from usage record of 50 million smarphones you are still sampling only smartphone users)
-false positives(while all car owners may buy things at garages,not every body who buys at garage owns a car)
-historical bias(the past in not always a good basis on which to predict future, especially in turbulent time)
- multipe-comparision hazards(test big data set for enought relationships and some associations will come up eventually)
- risk of confusing correlation with causation(increases in autism correlated with increases in vaccination- but there is no casual link.
I made a photo of the artcle for those who want to read it. See atachemnts.
------------------------------
Robert Pieczykolan
------------------------------
Original Message:
Sent: 08-21-2017 10:15
From: Kelly Zou
Subject: Your Thoughts on this Topic - "Data Science: The End of Statistics?"