Hi Emmeline,
Interesting question. I've seem many statisticians handle this differently. I worked in big pharma for quite a while and I always tried to stay close to the data management and data entry folks. I used to tell them that they do a wonderful job in cleaning the database but us statisticians look at the data in a differeent way. For example my explanation is that if you think of the data as a forest. The data management folks often do a great job and pruning each individual tree. But when they are done we statisticians have the opportunity of standing on a hill and overlooking the forest of trees and spotting outliers or values that are different.
So what do or should you do to help clean the data?
Some companies have SOPs addressed to this, but here is what I found to be helpful. Now sometimes this changes from study to study and the length and size of the study. Some work for smaller studies and some other things work for larger studies.
1. Look at the listings.
Look for gaps in the data.
Look for strange findings.
2. I like to use Proc Univariate on all the numeric values with plot option on to check for outliers or strange values.
3. Plots - one value against another. Possibly for important variables.
4. Freq Tables for categorical values. See if the values all make sense.
5. Finally - I like to send all the data through the planned analyses prior to unblinding to see if anything kills the analyses. I would hate to see this happen after datalock.
These are just some ideas on how a statistician can help in the data cleaning.
Now your last statement is interesting. My feeling is that your organizaiton is only responsible for providing the data as it was intended to be used in the protocol and SAP. Nothing more.
I hope this helps,
-------------------------------------------
Rocco Brunelle
Senior Statistician
Bowsher Brunelle Smith LLC
-------------------------------------------