ASA Connect

 View Only
  • 1.  Copyright and datasets

    Posted 02-12-2021 22:02
    I'm in the middle of writing an article about peer-reviewed datasets (datasets that are included as part of a peer-reviewed publication). One of my concerns is that very few peer-reviewed datasets list the conditions and restrictions on the use of the data. For example, am I allowed to keep a copy of one of these datasets on my public github site? A cursory review through Google shows some sources that indicate that data (with a very few exceptions) is not copyrightable. Data are facts and facts are not copyrightable. Other sources mention the concept of a creative compilation that would allow datasets to be copyrighted. If anyone can provide information, ideally through a source that would be considered definitive, I would greatly appreciate it.

    ------------------------------
    Stephen Simon, blog.pmean.com
    Independent Statistical Consultant
    P. Mean Consulting
    ------------------------------


  • 2.  RE: Copyright and datasets

    Posted 02-13-2021 21:30
    Hi Stephen,

    The researchers here might be able to point you in a useful direction: https://www.paperswithcode.com/about
    (By pure coincidence I came across this shortly after reading your post.)

    Best,
    Glen

    ------------------------------
    Glen Wright Colopy
    DPhil Oxon
    Data Scientist at Cenduit LLC, Durham, NC
    ------------------------------



  • 3.  RE: Copyright and datasets

    Posted 02-15-2021 17:04
    Hello there.
    Regarding, technically "copyright",  there is so much i can talk about and help. But as important or more is the ethics part of the  datasets, how deidentified they actually are and hence that part is what may preclude open sharing of some data? Different jurisdictions have different rules and protection acts for data and data sharing.

    Interesting to keep an eye on this discussion.

    ------------------------------
    Alberto Nettel-Aguirre
    Professor of Biostatistics
    University of Wollongong
    ------------------------------



  • 4.  RE: Copyright and datasets

    Posted 02-16-2021 08:51
    My thoughts are exactly along Prof. Nettel-Aguirre's lines. Since the removal of PIIs on its own is not necessarily sufficient as a deidentification practice in some emerging regulations (GDPR for example) yet different people have different understandings of what deidentification means, I am curious how the research community will handle data sets with human subjects--for example, how would you handle should a data set be deemed not completely anonymous by what turns out to be an applicable regulation?

    ------------------------------
    Michiko Wolcott
    Principal Consultant
    Msight Analytics
    ------------------------------



  • 5.  RE: Copyright and datasets

    Posted 02-15-2021 10:10
    Edited by Lars Vilhuber 02-15-2021 10:11
    I am Data Editor for the American Economic Association, and face this issue on a regular basis. Good guidance is provided by Stodden (2009) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1362040. Facts cannot be copyrighted (in the US!), but products derived from those facts typically do allow to be copyrighted (i.e., the presentation, organization, etc.). In general, for the data included in replication packages posted on our repository, we require that authors provide evidence that they have re-distribution rights, whether by explicit permission of the original data owner, or via some license (e.g., CC-BY or ODBL). Once the replication package is posted, it is subject to the license shown on the specific deposit (typically CC-BY, but where there is a license file, it applies). This can sometimes be tricky, and has only been (imperfectly, probably) monitored since July 2019 - any deposits from before that may not be quite as informative. For a discussion of some of the challenges, see my report from 2019 (https://pubs.aeaweb.org/doi/pdfplus/10.1257/pandp.109.718).

    From a practical point of view, I would never suggest posting a dataset unless you have permission from the original authors (of the data, not necessary of the original article), or the presence of an obvious license. Do not assume that the authors of the posted dataset have done their homework - if you repost data that is subject to re-distribution restrictions by the original data owner, you probably won't prevail by pointing at the previous author by saying "he also did it". We have had multiple instances of that when the previous replication package clearly infringed on data use agreements.


    ------------------------------
    Lars Vilhuber
    Managing Editor, Journal of Privacy and Confidentiality
    Data Editor, American Economic Association
    Economist
    Cornell University - Labor Dynamics Institute
    ------------------------------



  • 6.  RE: Copyright and datasets

    Posted 02-16-2021 12:04
    If data is already posted, why would we need to repost it? Wouldn't a statement like, "The data we used is available here. <enter website address>"

    Also, what if you are reusing data for publication in the same journal? Or, what if you want to reanalyze "textbook" data, as in a data set that comes from a textbook you used? How does that effect things?

    ------------------------------
    Andrew Ekstrom

    Statistician, Chemist, HPC Abuser;-)
    ------------------------------