I second the notion of implementing twice (different people and/or different languages), but I have some other ideas which have increased the quality of my own work enormously. Most of these are taken directly from the Software Engineering world. We are writing code, after all.
First, make your code highly modular. Coding is fundamentally about managing complexity. Break tasks into sub-tasks, and so forth until at the lowest level you have small bits of code, each of which does only one thing. 5 levels of functions calling functions would not be unusual. I use R, which manages functions very well. Unfortunately, I believe SAS does not support this style of programming well, mainly because there is only one namespace for the workspace and all macros. I think SAS encourages one towards a monolithic style with a few large macros rather than many small functions. So my unconventional recommendation in order to achieve quality is to stop using SAS. JMP's programming language looks pretty good, and I'm not in a place to comment about other statistical programming languages.
Second, develop unit tests for the functions. This amounts to checking, which you will do anyway, but after the fact. Just do it beforehand instead. It's really much more pleasant to do this checking early, in a relaxed state of mind, than checking results just before deadline, when you're getting anomalous results! Unit testing also drives good modular design: functions that are hard to test typically come from poorly-factored design. Factor your problem to make functions easy to test, and you're probably factoring your code well. It can be helpful to use a unit-testing framework such as RUnit to automate evaluating a lot of tests automatically. Once it's automatic, it's easy to to do testing frequently (several times an hour, as you write code).
Once you start testing you'll find certain kind of errors are more prevalent than others. Pay attention to this. For instance, R has matrices and data frames. They both hold data in spreadsheet-like formats, and under some operations they behave identically. But under other operations they give different results--and it's hard to remember which is when. Do you intend your code to work with either? Then give test cases with each. Or if you support only one, do you include error traps to exclude the one that doesn't work? The bottom line is that this is an area ripe for errors, and it's good to know where such areas are and to generate tests accordingly.
I started using these practices years ago when a large, high-profile, programming-intensive project was getting underway. I realized that my previous practice of write first, check later (especially to track down odd results) would not scale up. As complexity and size grows, the number of possible bugs, and the volume of code one has to investigate, becomes such that I would lose all confidence in my results. Plus, I could imagine running into an issue the afternoon before a deadline, and spending all night trying to track things down. So I started unit-testing every function, and my code quality improved immediately. I discovered bugs before they happened. Anomalous cases disappeared, except very rarely for bizarre "corner cases" that no one would think of in advance. And my colleagues had come to regard my code quality with such respect that when one of these rare cases did happen, it was the subject of comment. In other words, this approach has worked for me.
Documenting this quality is a related question, but improving actual quality (even if it's not documented) is still worth it.
To check the correctness of a statistical routine, I find applying it to simulated data where there is a known correct answer, or perhaps doing this with thousands of simulated data sets, is very valuable.
One more thing: recently I've begun using Sweave (in R) to create reports. This removes copying and pasting altogether, and makes a hard, transparent link between code and result. It utterly removes any ambiguity about what data and what code generated what result. It's absolutely golden for transparency and traceability, so I've begun using it for all critical reports. It doesn't mean your report is correct, but it does mean that it's transparent, so if concerns about errors arise you can check them out with confidence.
Incidentally, Sweave addresses a "reproducible research" problem that has become more pervasive as models and algorithms have gotten more complex, particularly around identifying biomarkers. Every now and then someone shows me an article on identified biomarkers and asks me, "Do you think this is valid?" Naturally the article is in a medical journal where the authors spend a page describing the specimen processing protocol, but one paragraph of text describing their unique marker-selection algorithm. So my answer is, "I have no earthly idea what they actually did!" If people published code with their results, and not just *some* code but *the* code that generated the article, we'd probably be seeing more reproducible results in the biomarker-identification area.
In short, if I were teaching, my recommendations to my students would be
- Use R.
- Factor your problem into a hierarchy of progressively more specific functions.
- Unit test, use RUnit.
- Coding is a craft that should be developed continually
- Use Sweave to write reports
- Use simulation to determine if statistical routines are working "well" (if not correctly).
- Oh, and take snapshots of your code using version control software such as SVN.
-Jim
-------------------------------------------
James Garrett
Manager, R&D Statistics
Becton Dickinson
-------------------------------------------