I should have mentioned that looking for suspicious values is part of data cleaning/prep. Obviously a value of 6 is outside the range of legitimate values for a Likert 5 point response scale. With a medium sized survey of 1 million cases, it is often useful to use the <data> <validation> in SPSS and continually build up the set of rules. It has options to use existing rules and to add new ones of out of legitimate range values, skip pattern, inconsistencies, etc. If you use another package you can just continuously develop/build syntax to do these kinds of validation. Also, substantive knowledge is important in examining distributions. Thorough completion and understanding of meta-data is a critical part of data prep. Comment and documents, variable labels, valid and missing value labels, level of measurement, readable output display formats, are needed.
The search for artifacts/anomolies should continue throughout the analysis. For example, finding a five-way interaction is very possibly due to data entry error.
-----
If you do imputation it is often advisable to try different approaches, list wise deletion, pairwise deletion, and value substitution.
Value substitution can be done by linear interpolation, mean of all other cases with valid values, mean of a fixed number of cases before after the case that has a missing value, median of all other cases with valid values, median of a fixed number of cases before after the case that has a missing value, , trend if there is other information, mean of other items in a scale, median of other items in a scale. Some times cases to find a value to substitute are from teh same cell (intersection strata by clusters) and sometimes across all cases without regard to stratum or cluster. Hot-deck methods are sometimes used.
The time to reduce measurement error is mostly when you are developing the data gathering instrument. Cognitive testing is vital. Look intensively for differences in nuances, denotations, and connotations within and across cultures/languages/disciplines. Use as fine grained a response scale as is practical for the respondents in your rounds of pre-testing. You can always coarsen measurement post hoc, but you cannot refine it. Despite common usage if one uses the term literally is impossible to disaggregate data. If you want results by department, product type, shifts etc. You must gather data by department, product type, shifts etc.
Total uncertainty is made up of sampling error AND measurement error. In my experience, YMMV, the uncertainty due to measurement considerations is often much larger than that due to sampling. Sampling error is very much a lower bound on how much uncertainty there is. However, we usually have more certainty about the amount of sampling uncertainty.
-------------------------------------------
Arthur Kendall
Social Research Consultants
-------------------------------------------