Statisticians in the medical product industry (pharmaceuticals, biologics, devices and diagnostics) operate within a highly-regulated work environment. Because of this, there are scores of standard operating procedures and guidances that specify how studies should be designed, completed, analyzed and reported. For example, there are International Conference of Harmonisation (ICH) guidances for Good Clinical Practice (CGP) and Good Manufacturing Practice (GMP) which describe how we (should) monitor clinical trials and manufacture drug product, respectively [1,2]. But what about Good Analytical Practice?
Many would say that Good Analytical Practice is implied by ICH E9: Statistical Principles in Clinical Trials [3]. The details of our experiment (clinical trial) are finalized in a written protocol prior to beginning the study. Blinding of treatment assignment is recommended where possible. A written statistical analysis plan which details methodologies and data summaries is completed prior to database lock and unblinding. A small number of primary endpoints are specified with strict attempts to control for type I error. After unblinding analyses are performed and study reports are written, documenting any deviations to the original analysis plan.
This is a great start. However, can ICH E9 be considered complete?
For example,
- The analysis is only as good as the data
- There is no discussion of the database lock process, such as what checks should be performed, who the signatories should be, and the implications of altering the database after it is considered final
- There are no specific details on data quality, though the GCP document is referenced. However, details for ongoing data quality assessments should probably be highlighted [4]
- There is limited discussion on data validation and what this entails. “Once data validation is complete, the analysis can proceed…”
- The analysis is only as good as the analysis
- There are data, such as those collected from case report forms or diaries (what would comprise CDISC SDTM domains), and there are analysis data that may involve windowing, imputation, or other complex derivations (what would comprise CDISC ADaM). There is really no mention of validation other than the statement above, and there doesn’t appear to be a distinction between these two kinds of data. However, as general practice in the industry, two sets of independent programming are performed to generate a given data set using pre-defined specifications (competing code validation). These final data sets are compared to identify differences in order to further refine the programming until no differences remain. Such rigor is important, since certain data or conditions may arise that may cause one (or both) implementation(s) to fail.
- Similar validation approaches for the analyses are often performed in practice. In other words, two sets of independent programming are performed to generate each table and figure. This is important for several reasons: complex derivations can often be performed in different ways with differing outcomes, statistical software often has numerous options available, and sometimes people make mistakes. What many may fail to realize, translating a written analysis plan into analysis code is often subject to some level of interpretation.
- Multiplicity is mentioned, but there are no best practices as to how to report it. For example, should only raw p-values be presented in the document with the appropriate alpha comparison left to the reader, or should multiplicity-adjusted p-values be presented?
- There is no discussion of good practices for data visualization.
- There is no discussion on the reporting of results. In my past experience in the industry, any document that went outside the company went through multi-disciplinary review with a validation of contents.
It may be worthwhile to consider what other important points should be summarized and included to define a single document on Good Analytical Practice. Going further, perhaps we should also review where the industry may fall short. For example, are treatment codes withheld from the study team for open-label trials? Is the primary endpoint withheld from the study team for a single arm trial? These are good practices to limit potential bias. In general, however, the medical product industry gets a lot of things right for Good Analytical Practice, but there is always room for improvement.
But what of other research? I’d like to propose that there is value in defining basic Good Analytical Practice for other industries and areas of research. For example, the American Statistical Association (ASA) recently released a statement and numerous commentaries (see the supplemental information) on the use and interpretation of p-values [5]. While this document, or something like it, has been needed for quite some time, the outright ban of p-values in one journal is likely responsible for bringing this topic to the forefront [6]. Encouraging greater use and putting greater emphasis on confidence intervals is one important recommendation.
Other analytical practices that should see wider use in other areas of research include independent validation of data sets and analyses. Two recent examples are highlighted in [7]. The Nevins-Potti example refers to a recent microarray study of chemosensitivity conducted at Duke University and highlights the benefit of independent analysis to uncover misconduct [8]. However, this independent analysis was only possible because the data was in the public domain (and initially, the findings went ignored). The Reinhart-Rogoff controversy involved economic austerity policies. Not only were there errors in the analysis (which validation could have identified), but “unconventional weighting” contributed heavily to the final results [7]. (Another important lesson: report all analysis assumptions.)
Many may argue that there are insufficient resources for this additional work, but some progress can be made. For example, independent validation of data and analyses for primary endpoints and hypotheses is a good first step. A statement that such validation occurred would provide additional confidence in the accuracy of reported results and conclusions. (But how to verify? That’s a good question.) Similar approaches can be applied to any findings that are contrary to current understanding.
I am grateful to be part of an industry with such strong emphasis on Good Analytical Practice. But it is important to extend these practices to other industries and areas of research as described above. Ideally, this will lead to higher quality research in the literature and fewer retracted articles [9,10]. Research findings can have huge implications in economic (Reinhart-Rogoff) and medical practice. Take, for example, the fraudulent 1998 paper published in The Lancet by Andrew Wakefield and associates linking autism to the MMR vaccine [11]. (Another important lesson: always have controls.) This manuscript has contributed to reduced vaccination rates and the re-emergence of diseases long thought to be eradicated. While this particular paper described a small case study, it does emphasize the important role journal editors and reviewers have for Good Analytical Practice. Caroline White of the British Medical Journal provides details on a case of suspected research fraud and recommendations to journals for addressing it [12]. (Another important lesson: when in doubt, ask to see the data.)
Open questions for comments:
- Where does the medical product industry fall short for Good Analytical Practice?
- What topics are must-have in the (currently fictitious) Good Analytical Practice guideline?
References
- International Conference of Harmonisation. (1996). E6: Good Clinical Practice.
- International Conference of Harmonisation. (2000). Q7: Good Manufacturing Practice for Active Pharmaceutical Ingredients.
- International Conference of Harmonisation. (1998). E9: Statistical Principles in Clinical Trials.
- US Food and Drug Administration. (2013). Guidance for Industry Oversight of Clinical Investigations - A Risk-Based Approach to Monitoring.
- Wasserstein RL & Lazara NA. (2016). The ASA's statement on p-values: context, process, and purpose. The American Statistician, DOI:10.1080/00031305.2016.1154108.
- Trafimow D&Marks M. (2015). Editorial. Basic and Applied Social Psychology. 37: 1–2.
- Irizarry R, Peng R & Leek J. (2013, Apr 21). Nevins-Potti, Reinhart-Rogoff. Simply Statistics.
- Baggerly KA & Coombes KR. (2009). Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. The Annals of Applied Statistics 3: 1309-1334.
- Grieneisen ML & Zhang M. (2012). A comprehensive survey of retracted articles from the scholarly literature. Public Library of Science (PloS) ONE 7(10): 1-15.
- Ioannidis JPA. (2005). Why most published research findings are false. PLoS Med 2(8): E124. DOI: 10.1371/journal.pmed.0020124.
- Godlee F, Smith J & Marcovitch H. (2011). Wakefield’s article linking MMR vaccine and autism was fraudulent. British Medical Journal 342: 64-66.
- White C. (2005). Suspected research fraud: Difficulties of getting at the truth. British Medical Journal 331: 281–288.