Discussion: View Thread

assuring the quality of statistical deliverables

  • 1.  assuring the quality of statistical deliverables

    Posted 07-14-2011 14:25
    Dear colleagues:

    My MStat students are asking for resources or guidelines or SOPs to assure the quality of statistical deliverables, like their SAS code, their report to the client, etc.

    What steps do you take to assure that your analysis is correct and appropriate; that your statistical software code is calculating what you intended;  that your report to the client is complete, understandable and accurate, etc.?  

    -------------------------------------------
    Marlene Egger
    Professor
    University of Utah, DFPM
    -------------------------------------------


  • 2.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 14:52

    In Pharma, and for  SAS software, the standard is for a second programmer to re-create/validate the programs/outputs. On a couple occasions I have used a second "blinded" programmer (one not involved in the project) to create/validate tabular output.
    We also pre-specify the statistical analyses, and have detailed mockup (table/listing/graph) shells. The analysis is planned in advance

    For sophisticated statistical analyses, we may bring in an external consultant with expertise in that area (unfortunately not a realistic option for your students).

    For statistical reports, typically a second statistician/manager reviews the outputs for the "statistical integrity".


    -------------------------------------------
    Chris Barker, Ph.D.
    www,barkerstats.com
    President  - San Francisco Bay Area Chapter of the American Statistical Association
    -------------------------------------------








  • 3.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 15:23


    -------------------------------------------
    Rebecca Hoagland
    Consulting Statistician
    -------------------------------------------
    I agree wtih Christopher.  After consulting for 20+ years, the best way to find mistakes and validate the analyses is to have an independent person redo the analyses.  There have been times, even on simple analyses, when I have made "stupid" mistakes that I did not catch and if it were not for the independent validator, I would not have caught my mistake. 90%-95% of the time the independent programmer comes back and confirms my output, but there are still the times when they catch something and you never know if this is the one time that you made some little mistake.   I would have the students pair up and double-check their work, if they are sending things out to clients.







  • 4.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 15:31

    I would fabricate a small dataset that you can perform the calculations by hand and check your answers with the SAS program.  Another check is to perform the same operations in another software, such as Excel or SPlus, to see if you get the same answers.

    -------------------------------------------
    Brian Taylor
    Operations Research Analyst
    Army Test & Evaluation Command
    -------------------------------------------








  • 5.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 16:59
    I can agree with the first form of checking.  But I can't see how the second form accomplishes anything.  Not the case any more but it used to be the case that when calculating means from data in an Excel spreadsheet Excel would treat blanks as observations equaling 0 and would therefore get incorrect values for the mean that could disagree with the same calculation in SAS.  If you compute on different software packages you may merely be checking the quality of the software package and not necessarily the correctness of your original analysis.  Another possibility is that different software packages require the data to be input in different ways. So you may be familiar with SAS and do the correct analysis in SAS where you know how to input the data for proper interpretation but being less familiar with Minitab, Excel or SPSS you input the data incorrectly and get different answers.  This really just illustrates a difference due to the improper use of the second software package and again does not reflect on whether or not the original calculation in SAS was correct.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 6.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 18:22

    All good suggestions.

    I was in charge of a small biostatistics group-myself and one talented SAS programmer. I did not have the budget or luxury of a second statistician or second programmer.

    I used the procedure described in previous posts of checking the construction of the analysis files.

    Fortunately, that works in part, because SAS/R/S procedures typically expect data to be in a "flat file format"

    I found there was value in coding statistical procedures in SAS and SPLUS, in part because in SPLUS one was required to do programming to verify outputs that were often automatically generated in SAS. For xample so called "estimates" or "least square means" had to be directly programmed in S, but were mostly automatic in SAS. And sometimes the procedures in SAS/SPLUS/R are almost identical, but the default settings of the procedures are not the same. For example, if one has a linear model, with say, 2 factors and 10 other covariates, what does SAS (proc glm) use for the values of covariates in the estimate/lsmean statement. (This is not a quiz question, SAS uses the mean for the covariate).

    There is of course, remains peril in coding a procedure in both SAS and S/R etc. for the "same mistake twice".

    Sometimes I could verify certain analysis by using "simpler" statistical methods. For example, in the literature ther e are papers with methods titled like "analysis by summary statistics". That sometimes worked for certain mixed model formulations. Some things are very difficult to check, for example, are the iterative estimates from a mixed model correct. 

    My manager at the time I ran a group, understood linear and multiple regression, and sometimes would insist that I run the data through a regression procedure - just to see what would happen. I took the advice in good humor and learned to use it as an opportunity to explain how the right statistical procedure was so much better. :)


    One of the favorite seminar questions I've heard, at a very scholarly university statistics seminar,  -one of the faculty attendees asked the speaker whether or not they had a simple way or rule of thumb to quickly check if the calculations were correct in a very complicated statistical model. 

    Another technique, that occasionally works is to try to explain/describe the statistical model/results to someone who is not a statistician.  Questions from a naive audience can sometimes uncover important flaws.


    -------------------------------
    Chris Barker, Ph.D.
    President - San Francisco Bay Area Chapter of the American Statistical Association
    www,barkerstats.com
    -------------------------------------------







  • 7.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 18:22
    I think trying a different package is useful.   If they have a different parameterization, alternative features, etc. it can make you think about why the answers are different, and by doing so, you keep 'checking, checking, and checking.'  Especially if you don't have a Statistician colleague to discuss the problem with, it's good to keep thinking about the problem in different ways in order to gain some confidence you aren't missing anything.

    -------------------------------------------
    Daniel Jeske
    University of California Department of Statistics
    -------------------------------------------








  • 8.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 18:53
    Okay I will buy that it could be useful in some situations.  But it has limitations and is not the ideal way to check whether you did something correctly.

    -------------------------------------------
    Michael Chernick
    Director of Biostatistical Services
    Lankenau Institute for Medical Research
    -------------------------------------------








  • 9.  RE:assuring the quality of statistical deliverables

    Posted 07-15-2011 09:30
    Most of the packages run test sets against each other, so another package is of much less use that one might think.
    I will run a method with different assumptions, e.g. a non-parametric procedure - the rank methods are easy to apply as are the permutation methods in most of the packages - e.g. SAS and STATA. 
    But you need to make sure that you preserve the correlation structure with a non-parametric method and this  really makes you think about your assumptions. 

    It's always great when the result is fairly independent of distributional assumptions.  Examining the effect of points that are not extreme enough to be outliers, yet extreme enough to be influential also forces an evaluation of the results, the data characteristics and the model.

    I have a particular preference for plots as diagnostic and validation tools.

    They also make you think about the structure and quality of the data (which is the other most likely problem and the hardest to detect - is 0 vs 1 code for m vs f or f vs m or is there an occasional 2 put in there by a different coder.  Recently I ran into a problem of did we use the correct codes for typical patients, those that had peri-operative strokes and those that have post-operative epilepsy.  Note the hidden time of occurence question.

    Ray



    -------------------------------------------
    Raymond Hoffmann
    Professor
    Medical College of Wisconsin
    -------------------------------------------








  • 10.  RE:assuring the quality of statistical deliverables

    Posted 07-15-2011 14:58
    For many years until 1997 or so, Jon Shuster was PI or co-PI for UF-based Statistical Office of the NCI's Pediatric Oncology Group (POG), which coordinated dozens of cancer trials concurrently. He once told me that no statistical report or memo was ever sent out until it had been reviewed by at least one other POG statistician. This exemplifies another aspect of statistical quality: no stat work is done well until it's reported well.

    -------------------------------------------
    Ralph O'Brien
    Case Western Reserve University
    -------------------------------------------








  • 11.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 15:55
    Dear Marienne,

    Good question. We require that any major piece of work sent to a client be reviewed by at least one other statistician before it goes out. We also have a "3-month rule". The documentation must be such that when we or the client returns to the document or deliverable after three months, we all must be able to understand it as a stand-alone document. 

    In carrying out our statistical work, we also try to create a data example for which we know the outcome or can closely predict it. We then run the fake data through the process to check that the expected did happen.

    We often do diagnostic and descriptive graphics that may not reach the client but which are useful in detecting problems or errors and in becoming familiar with the data. The data have a "personality" which it is important to get to know so that intuition and various other non-reasoning processes can contribute. 

    We also ask whoever is doing the statistical work to look it over for reasonableness. I.e., do the results make sense? Occasionally when we have outside people do work for us, they have just done the analysis, formatted it and sent it back to us without carefully reviewing it. We sometimes end up having to make corrections to these pieces of work. Often these changes are in the cover text that explains what was done. 

    Whatever you do, it is important to create a habit of care and checking, checking, checking, and, oh yes, checking. I think that the main "standard" for our operation is concern and care. This takes manifold forms that can not be put into a list. 

    I hope that this is helpful. It will be interesting to hear what others say.

    Best wishes,

    Nayak

    -------------------------------------------
    Nayak Polissar
    Consultant
    The Mountain Whisper Light
    -------------------------------------------








  • 12.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 16:23

    A difficult circumstance arises when the algorithm is so complex that it's hard to know if it's correct or not.  Look at any issue of JASA or Biometrics and you will see analyses like that.  If the result is counter-intuitive, or disagrees with a simpler analysis, does that mean that the complex analysis is wrong?  There are unfortunate stories about this, such as analyses of the relation of air pollution to health (submitted to Congress but subsequently retracted) and the recent front-page scandals about genomic screening in cancer chemotherapy.  Huge data sets, complex algorithms: how can the analyst check his/her work and how can the reader believe the results?  I worry that our tools have become too complex for us to understand how they work or what they produce.

     

               Larry Muenz, statistical consultant


    -------------------------------------------
    Larry Muenz
    -------------------------------------------








  • 13.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 16:42

    Hi Marlene:

    We create  "user requirements" for the programing in the form of a document outlining in pseudo code the programming steps required to go from data raw to the analysis data sets; this is review against the statistical analysis plan. Once finalized two independent statisticians (or trained programmers) create the final data sets following theis document; the data sets are compared and where diffeences are encountered the lead statistician resolves the issues.  This is repeated until the final analysis datasets are identical.

    From a risk analysis approach this is the place where the error could be the most profound and detecting it would be the most difficult.  The model can be reviewed with options justified, as well as the decision output by a second statistican as can the report.

    Hope this helps
    -------------------------------------------
    Janet McDougall
    President
    McDougall Scientific Ltd
    -------------------------------------------








  • 14.  RE:assuring the quality of statistical deliverables

    Posted 07-14-2011 16:56


    -------------------------------------------
    J. Dobbins
    Delmarva Foundation
    -------------------------------------------
    Marlene,

    This is an important aspect of things in the business world.  So am not surprised your students are asking if they also have jobs.

    Many companies, foundations, government contractors and others are ISO (International Standards Organization)  Certified and this requires a formal process document that you include quality assurance of the product be it a computer or a knowledge product such as a statistical projection based on a survey or audit of something such as payments requested and paid to a supplier of services such as a hospital, etc.  Regardless to stay in business and be competitive you need a good QMS (Quality Management System) which includes these components.  So the product involving statistical analysis or development of a predictive model or SAS program generally must follow a process, that includes a statement of the problem or design of the program, be peer reviewed at each major stage or milestone and then get a final signoff. These quality records must be maintained for internal business purposes or in case of legal developments or whatever.  They include the above as well as checklists.

    If you business depends on government or state contracts then such a QMS is required to be competitive and win contracts.  So QA or QC is very important and good for your students to know about as well as the subject matter in the class.

    If the product as for example SAS code then a good reference is Testing Computer Software by Kaner et al. This and perhaps numerous newer texts cover aspects and details  of the steps such as

    • Requirements Definition
    • Design
    • Development ( waterfall, prototyping, etc )
    • Testing (black box, proofs, etc)
    • Documentation
    • Training
    • Release for General Use By DA Staff
    • Revision control

    Would be happy to talk more with you if this is relevant for you.  I have often thought SAS should have a course on this and may have suggested it at one time.  Maybe they have one. Would be a good idea.

    THis is a big and interesting subject.

    Hurriedly,


    Greg Dobbins






  • 15.  RE:assuring the quality of statistical deliverables

    Posted 07-15-2011 08:59

    In terms of resourses that are available for your students to review, I can think of a few sources off the top of my head. There are probably more. First is the book "Data Quality and Record Linage Techniques" by Herzog, Scheuren, and Winkler (Springer 2007). In the context of data quality, the book provides some useful tools for checking and documenting quality. Demming's work on qualtiy is also a very good source for finding advice on best practices.

    At JSM 2008 (Denver) their was a Statistical Appilcations in Business session that has some papers that might be of interest to your students. You can find the papers in the online proceedings. Look up session 502 in the program book to find the author names that you can search on to find the papers. In particular, there is a paper by Susan Hinkins and myself that talks about documentation practices.


    -------------------------------------------
    Edward Mulrow, PStat®
    Senior Statistician
    NORC
    -------------------------------------------








  • 16.  RE:assuring the quality of statistical deliverables

    Posted 07-15-2011 14:33
    I second the notion of implementing twice (different people and/or different languages), but I have some other ideas which have increased the quality of my own work enormously.  Most of these are taken directly from the Software Engineering world.  We are writing code, after all.

    First, make your code highly modular.  Coding is fundamentally about managing complexity.  Break tasks into sub-tasks, and so forth until at the lowest level you have small bits of code, each of which does only one thing.  5 levels of functions calling functions would not be unusual.  I use R, which manages functions very well.  Unfortunately, I believe SAS does not support this style of programming well, mainly because there is only one namespace for the workspace and all macros.  I think SAS encourages one towards a monolithic style with a few large macros rather than many small functions.  So my unconventional recommendation in order to achieve quality is to stop using SAS.  JMP's programming language looks pretty good, and I'm not in a place to comment about other statistical programming languages.

    Second, develop unit tests for the functions.  This amounts to checking, which you will do anyway, but after the fact.  Just do it beforehand instead.  It's really much more pleasant to do this checking early, in a relaxed state of mind, than checking results just before deadline, when you're getting anomalous results!  Unit testing also drives good modular design:  functions that are hard to test typically come from poorly-factored design.  Factor your problem to make functions easy to test, and you're probably factoring your code well.  It can be helpful to use a unit-testing framework such as RUnit to automate evaluating a lot of tests automatically.  Once it's automatic, it's easy to to do testing frequently (several times an hour, as you write code).

    Once you start testing you'll find certain kind of errors are more prevalent than others.  Pay attention to this.  For instance, R has matrices and data frames.  They both hold data in spreadsheet-like formats, and under some operations they behave identically.  But under other operations they give different results--and it's hard to remember which is when.  Do you intend your code to work with either?  Then give test cases with each.  Or if you support only one, do you include error traps to exclude the one that doesn't work?  The bottom line is that this is an area ripe for errors, and it's good to know where such areas are and to generate tests accordingly.

    I started using these practices years ago when a large, high-profile, programming-intensive project was getting underway.  I realized that my previous practice of write first, check later (especially to track down odd results) would not scale up.  As complexity and size grows, the number of possible bugs, and the volume of code one has to investigate, becomes such that I would lose all confidence in my results.  Plus, I could imagine running into an issue the afternoon before a deadline, and spending all night trying to track things down.  So I started unit-testing every function, and my code quality improved immediately.  I discovered bugs before they happened.  Anomalous cases disappeared, except very rarely for bizarre "corner cases" that no one would think of in advance.  And my colleagues had come to regard my code quality with such respect that when one of these rare cases did happen, it was the subject of comment.  In other words, this approach has worked for me.

    Documenting this quality is a related question, but improving actual quality (even if it's not documented) is still worth it.

    To check the correctness of a statistical routine, I find applying it to simulated data where there is a known correct answer, or perhaps doing this with thousands of simulated data sets, is very valuable.

    One more thing:  recently I've begun using Sweave (in R) to create reports.  This removes copying and pasting altogether, and makes a hard, transparent link between code and result.  It utterly removes any ambiguity about what data and what code generated what result.  It's absolutely golden for transparency and traceability, so I've begun using it for all critical reports.  It doesn't mean your report is correct, but it does mean that it's transparent, so if concerns about errors arise you can check them out with confidence.

    Incidentally, Sweave addresses a "reproducible research" problem that has become more pervasive as models and algorithms have gotten more complex, particularly around identifying biomarkers.  Every now and then someone shows me an article on identified biomarkers and asks me, "Do you think this is valid?"  Naturally the article is in a medical journal where the authors spend a page describing the specimen processing protocol, but one paragraph of text describing their unique marker-selection algorithm.  So my answer is, "I have no earthly idea what they actually did!"  If people published code with their results, and not just *some* code but *the* code that generated the article, we'd probably be seeing more reproducible results in the biomarker-identification area.

    In short, if I were teaching, my recommendations to my students would be
    1. Use R.
    2. Factor your problem into a hierarchy of progressively more specific functions.
    3. Unit test, use RUnit.
    4. Coding is a craft that should be developed continually
    5. Use Sweave to write reports
    6. Use simulation to determine if statistical routines are working "well" (if not correctly).
    7. Oh, and take snapshots of your code using version control software such as SVN.
    -Jim

    -------------------------------------------
    James Garrett
    Manager, R&D Statistics
    Becton Dickinson
    -------------------------------------------