ASA Connect

 View Only
Expand all | Collapse all

ASA journals policy regarding data sharing and reproducibility

  • 1.  ASA journals policy regarding data sharing and reproducibility

    Posted 01-17-2018 15:49

    (primary credit for this blog post goes to Eric Sampson, ASA's Journals Manager)

    With the new year we wanted to make you aware of the policies of the ASA's journals with respect to data sharing and reproducibility.  Please be aware that the information below applies only to ASA-owned and managed journals, published with the Association's partner Taylor and Francis. Other ASA-affiliated journals, owned and managed by other societies and publishers, may follow different policies.

    The American Statistical Association strongly encourages authors to include as supplementary material datasets and code that demonstrate the results shown in the final article. ASA journal editors may set their own specific policies based on this broad principle, and authors may request a waiver for reasons of confidentiality or security.

    For example, the Journal of the American Statistical Association, Applications and Case Studies, has now implemented a reproducibility review. Authors are required to submit materials that demonstrate an article's results, and these materials are reviewed by an Associate Editor for Reproducibility (or someone chosen by the AER). This review occurs concurrently with the peer-review of the paper itself, but only after a paper is accepted pending major or minor revisions. In addition to posting the code and data as supplements along with the published article, those materials will also be posted on a public repository such as Github and/or Dataverse, with links back to the published article.

    Another example is the Journal of Computational and Graphical Statistics, which requires authors to submit code and data (and any other relevant materials) along with their papers, although extensive review of those materials is not necessarily performed.  

    Beginning in 2018, the ASA's publishing partner, Taylor and Francis, is also introducing a new Basic Data Sharing Policy across all its journal titles. This policy will encourage authors to share and make the data underlying the published article publicly available where this does not violate protection of human subjects or other valid subject privacy concerns. Authors will be further encouraged to cite any data referenced in the paper whether this has been created by the author or someone else, cited data sets should also be included in the reference list. Finally, authors will be encouraged to include a Data Availability Statement.

    In addition to this overarching policy, Taylor & Francis will be announcing a suite of data sharing policies that go beyond encouragement to mandating specific types of actions. These policies will range from authors agreeing to share their data upon reasonable request, to making data publicly available, to making data fully open with re-use rights. 

    Data sharing policies will be set at the journal level in consultation with editors and relevant societies or other stakeholders during 2018. Taylor & Francis serves subject areas that are keen to push forward an open agenda, while other fields need to take smaller steps. Policies will reflect emerging norms in the specific subject area covered by the journal. Our aim is to facilitate a dialogue about data sharing to identify a comfortable position for each journal to depart from.

    Comments? Post them in the comment section or email me (ron@amstat.org) or Eric (eric@amstat.org).  Thanks!



    ------------------------------
    Ron Wasserstein
    Executive Director
    The American Statistical Association
    Promoting the Practice and Profession of Statistics
    732 N. Washington St.
    Alexandria, VA 22314
    703-684-1221 x1860
    ------------------------------


  • 2.  RE: ASA journals policy regarding data sharing and reproducibility

    Posted 01-19-2018 02:45
    Generally I agree with these requirements. There should be no reasons for not requiring code, in all software used, to be included as supplementary material in articles. Concerning data, including data is usually not possible in the health sciences due to ethical requirements to protect patient privacy and legislation related to that. What may be done, however, is to include information on the structure of the raw data. For construction of all derived variables, code and not data should be included. The precise procedures to include information on the structure of the raw data depend on software used. In R, my preferred environment for most analyses, something like the following procedure may often be appropriate:

    empty <- raw[raw$id<0,] # gives an empty data.frame if raw is a data.frame
    # and raw$id is a positive numerical variable with no missing values
    dim(empty) # check that empty has 0 lines and therefore contains no data
    rm(raw) # remove the raw data
    ls() # check that the workspace only contains empty, if not remove all other contents and check again, then save workspace

    The resulting data.frame  empty has no lines and therefore contains no data, still it technically is a data.frame with information on types of all variables and the levels of all factor variables. In SPSS one may simply delete all contents in Data View while leaving Variable View unchanged. ​​​​In other software similar arrangements should be possible.
    ​​​​​

    ------------------------------
    Tore Wentzel-Larsen
    Researcher
    Norwegian Centre for Violence and Traumatic Stress Studies,
    Regional Center for Child and Adolescent Mental Health, Eastern and Southern Norway]
    ------------------------------