The reproducibility crisis is well known in the scientific community, and groups like the NIH, NSF, and other professional societies are taking steps to address it. If the ASA were to be asked for advice from a research funding agency for their reproducible research plans, what would our recommendations be? Recently, ASA Science Policy staff asked several ASA members for their thoughts. I’m writing to open the current draft—still a work in progress—up to the broader community. Comments can be submitted in the space below or in this Google doc.
First, let me briefly summarize what NIH and NSF are doing. The NIH has taken several steps as outlined in this 2015 blog post from NIH Deputy Director of Extramural Research Michael Lauer and this Office of Extramural Research website, Rigor and Reproducibility. This summer, NIH released a guide explaining how to address rigor and reproducibility in NIH applications. Their newly released plan for submitting clinical trial proposals is also seen as part of their rigor and reproducibility efforts. According to the NIH, scientific rigor is the “strict application of the scientific method to ensure robust and unbiased experimental design, methodology, analysis, interpretation and reporting of results”. Based off of this definition, it is easy to see why statisticians can bring scientific research to a new level of rigor.
For its part, NSF submitted to OMB in late 2014 A Framework for Ongoing and Future National Science Foundation Activities. In 2015, they released a video of an NSF interview with Brian Nosek, director of the Center for Open Science, discussing reproducibility and replicability. In addition, a subcommittee on Replicability in Science of the Advisory Committee to the Directorate for Social, Behavioral, and Economic Sciences (SBE) produced this report, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science with nine recommendations (see bottom of this blog post.) In September, SBE released this Dear Colleague letter, Robust and Reliable Research in the Social, Behavioral, and Economic Sciences. Earlier this year, the GEO Directorate issued a Dear Colleague Letter affirming that they “continue to welcome proposals related to enhancing the validity of the data and outcomes of research in all GEO programs.”
On the professional society side, the Federation of American Societies for Experimental Biology has this document, Enhancing Research Reproducibility, that according to its blog post, addresses general factors impeding “the ability to reproduce experimental results as well as factors that specifically affect the use of two key tools critical to basic research: mouse models and antibodies. The report suggests actions for stakeholders across the research enterprise, including scientists, institutions, professional societies, journals, and federal agencies.”
This summary is not meant to be comprehensive. Indeed, we are just starting to explore how the ASA can heighten the profile of the role of statisticians in reproducible research to build on the extensive work that has been done and is still being done by statisticians. In addition to your comments on this document, we’d welcome your additions and comments to this quick (and therefore cursory) review of efforts underway by funding agencies and professional societies, especially if we have missed efforts from other organizations. We also welcome your suggestions for ASA activities to further complement and/or amplify the ongoing work of statisticians on reproducible research.
Contributors to this document include Karl Broman, Mine Cetinkaya-Rundel, Chris Paciorek, Roger Peng, Daniel Turek, and Hadley Wickham.
Informal Recommendations for Robust and Reliable Research
- Reproducibility: A study is reproducible if you can take the original data and the computer code used to analyze the data and reproduce all of the numerical findings from the study. This may initially sound like a trivial task but experience has shown that it’s not always easy to achieve this seemly minimal standard.
- Replicability: This is the act of repeating an entire study, independently of the original investigator without the use of original data (but generally using the same methods).
- Reproducibility is enhanced by following best current practices, including:
- Ideally, exclusive use of publicly available data. However if the research domain does not allow for publically available data for widely accepted reasons (e.g., medical data with high confidentiality concerns), the principles outlined in items (b) - (e) should still be followed;
- Use of version control for all (collaborative or individual) code development;
- Exclusive use of open-source software freely available to anyone in the world
- End-to-end scripting of research, including data processing and cleaning, statistical analyses, visualizations, and report and/or manuscript generation, with the full workflow made available to others;
- Use of container/virtual machine tools to capture software versions, dependencies, and platform specifics;
- Publication of code in public repositories as with data; and
- For projects that develop algorithms, implementing algorithms on standard computational platforms (e.g., R packages, Python packages, source code packages installable via standard methods, etc.).
- Reproducibility shouldn't be thought of as a binary state: either reproducible or not-reproducible. It's more useful to think about it as continuum from hard to reproduce to easy to reproduce. The goal of any reproducibility effort should broadly be to move as many people as possible further towards easy-to-reproduce. This involves some technological (to make the right thing easier than the wrong thing) and some social (to give people the activation energy to learn a better process even though it's harder in the short-term) components.
- It’s perhaps worth noting that in this era of “replication crisis”, reproducibility is the only thing that can be effectively guaranteed in a published study. Whether any claimed findings are indeed true or false can only be confirmed via additional studies, but reproducibility can be confirmed immediately.
- There are several important barriers to researchers making their research reproducible.
- Lack of skill with the available tools for reproducibility, including better programming skills and awareness of best practices and tools for reproducible research.
- Doing research reproducibly takes time. In a competitive environment, researchers see more benefit in working on more projects than in doing research reproducibly.
- Related to (b) there are limited explicit incentives to doing careful reproducible research compared with writing more papers, although in some cases researchers may find that their work is more heavily cited and more influential when they make their code available for others to use, particularly as general software product.
- The funding model for reproducible research has not been worked out yet. In particular, if data and code are to be made available to the public in perpetuity, it is not clear who should pay for that.
Recommendations (specific to a funding agency):
- Develop funding mechanisms to support small-scale software development and data products by researchers in domain areas rather than software developers. This might involve new development or provide support for software and datasets developed in the course of a grant beyond the lifespan of the grant, particularly for software and data that get traction in the community. Methodological researchers often produce small software products for which getting an entire full-size grant not appropriate. Similarly, scientific researchers may develop a data product that needs to be maintained long-term. Perhaps this could be done via mini-grants that a researcher can apply for as a sort of extension to the main grant that would not be guaranteed but would not be too hard to get funded.
- Fund work that has as part of its aims to reproduce and/or replicate previous work when that previous work is sufficiently important. E.g., for a proposal that proposes a new idea in area X, one aim of the grant might be to reproduce or replicate a key previous finding in area X on which the new work would build.
- Provide support for the development of appropriate courses. Most students and faculty have little training in how to organize their data and software so that their analyses are reproducible.
- Possible training resources:
- Broman’s course on reproducible research: http://kbroman.org/Tools4RR
- software-carpentry.org and datacarpentry.org workshops
- Consider including code management plan as part of the current data management plan section of grant proposals (but without requiring more writing in the proposal).
- Duke has implemented reproducible research training as part of a computing bootcamp within the grad orientation, and students receive Responsible Conduct of Research credit for it -- that's the incentive since they need to get a certain number of these credits during grad school. Materials for this workshop are at https://github.com/mine-cetinkaya-rundel/dss_computing_bootcamp.
- Would there be ways to either increase one’s chances of having a grant funded with robust and reliable science steps in the grant? Or could likelihood of renewals be increased based on robust and reliable science steps in previous grant work?
- Dissemination of best practices
- Funding Agency to provide guidance for what a researcher should do for their study to be seen as reasonably reproducible. See for example this Nature checklist, http://www.nature.com/authors/policies/checklist.pdf, and the beginnings of a similar (more informal) checklist put together by the University of Washington's eScience group Open Science and Reproducible Badges: https://github.com/uwescience/reproducible/wiki/%5BDRAFT%5D-Open-Science-and-Reproducible-Badges. Researchers could also be made aware of the Open Science Framework's Badges to Acknowledge Open Practices: https://osf.io/tvyxz/
- Increased use of statistician reviewers where more attention to design, analysis, inference, and uncertainty quantification would benefit the science (similar to Science Magazine’s Statistical Board of Reviewing Editors S-BoRE).
Recommendations from SBE report, Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science:
- Each report of research supported by NSF should be accompanied by detailed documentation on procedures to enable an independent researcher to reproduce the results of the original researcher, which should be included in a project’s final report and in proposals seeking new support.
- NSF should sponsor research that evaluates various approaches to determining whether a finding replicates and to assess which approach(es) under which circumstances are the most helpful for reaching valid conclusions about replicability.
- To permit assessing replication in various ways, NSF should encourage researchers to report associations between variables using different metrics (e.g., standardized and unstandardized coefficients, effect sizes, odds ratios) and indicating precision of estimates (with standard errors) and to assess the statistical significance of findings using these different methods.
- NSF should sponsor research that identifies optimal procedures for practically assessing all types of generalizability of findings (e.g., from a set of study participants to a population, from one set of measures to other measures, from one set of circumstances to other circumstances) and differentiating lack of generalizability from failure to replicate.
- NSF should fund research exploring the optimal and minimum standards for reporting statistical results so as to permit useful meta-analyses.
- NSF should support research into the use of questionable research practices, the causes that encourage such behavior, and the effectiveness of proposed interventions intended to discourage such behavior and should support the identification of empirically-validated optimal research practices to avoid the production of illusory findings.
- In NSF grant proposals, investigators should be required to describe plans for implementing and fully reporting tests of the robustness of findings using alternate analytical methods (when appropriate). In addition, researchers should be encouraged to design studies whose outcomes would be theoretically interesting regardless of the outcome, or of seriously considering more than one hypothesis. In grant progress reports and final reports, investigators should be required to describe whether more than one hypothesis was considered, the robustness checks conducted and results obtained.
- NSF should sponsor research seeking to document suboptimal practices that are widespread in particular fields, with an eye towards identifying those areas that most depart from the scientific ideals and contribute to non-robust research findings.
- NSF should create a Foundation-wide committee of experts to monitor issues of reproducibility, replicability, and generalizability of findings, to support investigations of these issues and disseminate insights gained both within the Foundation and outside the Foundation, to propose ways to change the NSF granting process to enhance scientific quality and efficiency, and to provide leadership on these issues in the coming decades.
 Definitions copied from: “A Simple Explanation for the Replication Crisis in Science,” Roger Peng, August 24, 2016, http://simplystatistics.org/2016/08/24/replication-crisis/.
UPDATE: On October 21, the NSF released a Dear Colleague letter on "Encouraging Reproducibility in Computing and Communications Research." See the letter at https://www.nsf.gov/pubs/2017/nsf17022/nsf17022.jsp and consider submitting a proposal!
UPDATE: On November 3, an article describing NIH's plan to require grant applicants to submit data sharing plans with other grant materials was published...read it here! http://www.rollcall.com/news/nih-require-researchers-receiving-grants-share-data
UPDATE: The ASA has finalized the document, which can be found at http://www.amstat.org/asa/files/pdfs/POL-ReproducibleResearchRecommendations.pdf.