Hi Sayan,
That question makes perfect sense to me!
One obvious example is biomedical data, where you have several inferential steps on the "rawer" forms of the data prior to the using anything for clinical inference.
If you imagine a vital sign time series, for example from a bedside monitor, the data display is rife with artefacts. You may wish to remove these artefact before inference on clinical condition. On the simplest end of the spectrum your "ML" algorithm could just be a running mean/median which smooths out vital signs artefacts. On the more complex end of the spectrum, it could be a Gaussian process regressor (or choose your favorite ML classifier) that compares the variance/covariance of each measurement to it's neighbours in the time series (and across multiple features) to detect novelty or an unusual dynamics.
So in this, you've done quite a bit of ML prior to the primary clinical inference. On top of that you could also examine the ML techniques used before (ex. in the signal processing to derive vital sign estimates from the raw waveforms), and after (to detect conditions like atrial fibrillation, hypoxia, general deterioration, etc.)
Here are some links below to continue your search. A google search of "novelty detection" is also a good idea.
GPs for artefact identifcation:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352.8305&rep=rep1&type=pdfPlenty of ML here to deal with artefact and noise:
https://www.ncbi.nlm.nih.gov/pubmed/25570111https://physionet.org/challenge/A low computation example for embedded devices:
http://www.robots.ox.ac.uk/~sjrob/Pubs/embc2017.pdfA great summary article on novelty detection (which may or may not be used for artefacts)
http://www.robots.ox.ac.uk/~davidc/pubs/NDreview2014.pdfHope this helps!
------------------------------
Glen Wright Colopy
DPhil Student
University of Oxford
------------------------------
Original Message:
Sent: 01-18-2018 09:26
From: Sayan Datta
Subject: Automate Data Cleaning using Machine Learning or Artificial Intelligence
Hello
Can anyone share any information on the use of AI or ML in data cleaning? To clarify, I am referring to the cleaning of data (missing values, outliers, nonsensical values et al) prior to model development. Also the model can be anything - statistical ( say a regression based model) or mathematical.
Not sure if the question posed above is too generic but any inputs will be much appreciated!!! Have a great day!!
Thanks, Sayan