I am Data Editor for the American Economic Association, and face this issue on a regular basis. Good guidance is provided by Stodden (2009)
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1362040. Facts cannot be copyrighted (in the US!), but products derived from those facts typically do allow to be copyrighted (i.e., the presentation, organization, etc.). In general, for the data included in replication packages posted on our repository, we require that authors provide evidence that they have re-distribution rights, whether by explicit permission of the original data owner, or via some license (e.g., CC-BY or ODBL). Once the replication package is posted, it is subject to the license shown on the specific deposit (typically CC-BY, but where there is a license file, it applies). This can sometimes be tricky, and has only been (imperfectly, probably) monitored since July 2019 - any deposits from before that may not be quite as informative. For a discussion of some of the challenges, see my report from 2019 (https://pubs.aeaweb.org/doi/pdfplus/10.1257/pandp.109.718).
From a practical point of view, I would never suggest posting a dataset unless you have permission from the original authors (of the data, not necessary of the original article), or the presence of an obvious license. Do not assume that the authors of the posted dataset have done their homework - if you repost data that is subject to re-distribution restrictions by the original data owner, you probably won't prevail by pointing at the previous author by saying "he also did it". We have had multiple instances of that when the previous replication package clearly infringed on data use agreements.
------------------------------
Lars Vilhuber
Managing Editor, Journal of Privacy and Confidentiality
Data Editor, American Economic Association
Economist
Cornell University - Labor Dynamics Institute
------------------------------
Original Message:
Sent: 02-12-2021 22:01
From: Stephen Simon
Subject: Copyright and datasets
I'm in the middle of writing an article about peer-reviewed datasets (datasets that are included as part of a peer-reviewed publication). One of my concerns is that very few peer-reviewed datasets list the conditions and restrictions on the use of the data. For example, am I allowed to keep a copy of one of these datasets on my public github site? A cursory review through Google shows some sources that indicate that data (with a very few exceptions) is not copyrightable. Data are facts and facts are not copyrightable. Other sources mention the concept of a creative compilation that would allow datasets to be copyrighted. If anyone can provide information, ideally through a source that would be considered definitive, I would greatly appreciate it.
------------------------------
Stephen Simon, blog.pmean.com
Independent Statistical Consultant
P. Mean Consulting
------------------------------