Generally what happens with a simulation, you run a simple program and it generates thousands of data points. You keep those data points until you have time to process them.
Depending upon how they run a simulation, taking a sample of the data will yield false results.
If you look at what BOINC does, they have large sets of data or simulation models that take hours to run. Part of the reason BOINC does what it does, is it needs help processing all the data to get it to a manageable size.
I was at a talk a few days ago discussing "Big Data" and biological simulations. They have dozens of servers working 24/7 trying to process all the data the simulations generate. Only 1% of the data might be valuable. But, do you want to pass up on the cure for (enter disease name here)?
The final data set might be able to fit on a flash drive. It will take peta-flops of data to get there.
------------------------------
Andrew Ekstrom
Statistician, Chemist, HPC Abuser;-)
Original Message:
Sent: 03-07-2016 12:21
From: Michael Mout
Subject: Big Data Software
In general, you probably don't need Big Data unless you are looking for tiny niches of importance (<1%).
For any analytics of the group being analyzed you can typically take a reasonable random sample and analyze it with any common S/W package. For most purposes, a sample of 10K-100K is more than adequate and certainly easily handled by most packages.
In fact, Big Data can often result in deceiving results. Showing significant relationships when there are none due to the large sample size. This is also true with smaller samples, say 100K. The key when looking at results from large samples is to look at not only the statistical significance of the results, but whether the results are meaningful.
For example, a simple t-test may show a significant difference of <.001 but the actual difference may be very small. Consultation with SME's (Subject Matter Experts) are helpful in this regard.
Michael L. Mout, MS, Cstat, Csci
MIKS & Assoc. - Senior Consultant/Owner
4957 Gray Goose Ln, Ladson, SC 29456
804-314-5147(Mbl), 843-871-3039 (Home)