Friday, January 20, 2017

The NIH Public Access Policy: A triumph of green open access?

There has always been a contradiction at the heart of the open access movement. Let me explain.

The Budapest Open Access Initiative (BOAI) defined open access as being the:

“free availability [of research papers] on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. The only constraint on reproduction and distribution, and the only role for copyright in this domain, should be to give authors control over the integrity of their work and the right to be properly acknowledged and cited.”

BOAI then proceeded to outline two strategies for achieving open access: (I) Self-archiving; (II) a new generation of open-access journals. These two strategies later became known, respectively, as green OA and gold OA.

At the time of the BOAI meeting the Creative Commons licences had not been released. When they were, OA advocates began to insist that to meet the BOAI definition, research papers had to have a CC BY licence attached, thereby signalling to the world that anyone was free to share, adapt and reuse the work for any purpose, even commercially.

For OA purists, therefore, a research paper can only be described as open access if it has a CC BY licence attached.

The problem here, of course, is that the vast majority of papers deposited in repositories cannot be made available on a CC BY basis, because green OA assumes authors continue to publish in subscription journals and then self-archive a copy of their work in an open repository.

Since publishing in a subscription journal requires assigning copyright (or exclusive publishing rights) to a publisher, and few (if any) subscription publishers will allow papers that are earning them subscription revenues to be made available with a CC BY licence attached, we can see the contradiction built into the open access movement. Quite simply, green OA cannot meet the definition of open access prescribed by BOAI.

To see how this works in practice, let’s consider the National Institutes of Health (NIH) Public Access Policy. This is described on Wikipedia as an “open access mandate”, and by Nature as a green OA policy, since it requires that all papers published as a result of NIH funding have to be made freely available in the NIH repository PubMed Central (PMC) within 12 months of publication. In fact, the NIH policy is viewed as the premier green OA policy.

But how many of the papers being deposited in PMC in order to comply with the Policy have a CC BY licence attached and so are, strictly speaking, open access?

There are currently 4.2 million articles in PMC. Of these around 1.5 million consist of pre-2000 historical content being deposited as part of the NIH’s scanning projects. Some of these papers are still under copyright, some are in the public domain, and some are available CC BY-NC. However, since this is historical material pre-dating both the open access movement and the NIH Policy let’s put it aside.

That leaves us with around 2.7 million papers in PMC that have been published since 2000. Today around 24% of these papers have a CC BY licence attached. In other words, some 76% of the papers in PMC are not open access as defined by BOAI.

The good news is that the percentage with a CC BY licence is growing, and the table below (kindly put together for me by PMC) shows this growth. In 2008, just 8% of the papers in PMC had a CC BY licence attached. Since then the percentage has grown to 12% in 2010, 14% in 2012, 19% in 2014 and, as noted, it stands at 24% today. 

So, although the majority of papers in PMC today are not strictly speaking open access, the percentage that are is growing over time. Is this a triumph of green OA? Let’s consider.

There are two submission routes to PMC. Where there is an agreement between NIH and a publisher, research papers can be input directly into PMC by that publisher. Authors, and publishers with no PMC agreement, have to use the NIH Manuscript Submission System (NIHMS, overview here).

The table above shows that the number of “author manuscripts” that came via the NIHMS route represents just 19% of the content in PMC. And since some publishers do not have an agreement with PMC, the number that will have been self-archived by authors will be that much lower. So the overwhelming majority of papers being uploaded to PMC are being uploaded not by authors, but by publishers, and it seems safe to assume that those papers with a CC BY licence attached (currently 24% of the total) will have been published as gold OA rather than under the subscription model.

We could also note that just 0.06% of the papers in PMC today that were deposited via the NIHMS have a CC BY licence attached, and we can assume that these were submitted by gold publishers that do not have an agreement allowing for direct deposit, rather than by authors. 

In short, it would seem that the growth in CC BY papers in PMC is a function of the growth of gold OA, not green OA. As such, we might want to conclude that the success of PMC is a triumph of gold OA rather than of green OA.

Does this matter? The answer will probably depend on one’s views of the merits of article-processing charges, which I think it safe to assume most of the papers in PMC with a CC BY licence will have incurred.

Either way, that today 76% of the content in PMC – the world’s premier open repository – still cannot meet the BOAI definition of open access suggests that the OA movement still has a way to go.