Giter Club home page Giter Club logo

hoad's Introduction

Main Codecov test coverage CRAN status Lifecycle: experimental

Hybrid OA Dashboard

Bibliometric data analytics to increase cost transparency in hybrid open access transformation contracts.

Try out the dashboard Sign up to the newsletter (german only)

Overview

Many academic publishers offer hybrid (hybrid OA) open access journals, where some articles in an otherwise subscription-based publication are made openly available. Recently, some funders have pushed for a transformation towards such a hybrid OA business model, where publishing houses are paid for open access publication. To draft, monitor and evaluate such transformative agreements, libraries and their consortia need data on the uptake, costs and impact of hybrid OA.

{HOAD} is a data product to meet this need. The dashboard is packaged as an extension to the R Project for Statistical Computing (an R package), released under an open source license and developed in the open at http://github.com/subugoe/hoad. The package has several components:

  1. APIs to expose data from public bibliometric sources relevant to hybrid OA.
  2. ETL pipelines (extraction, transformation, loading) and accompanying visualisations to answer hybrid OA business questions.
  3. A web application to explore hybrid OA data, including customisation for individual journal portfolios.

The project is based on data gathered by the Crossref DOI registration agency and the OpenAPC initiative. The package is at the Göttingen State and University Libary as part of the DFG-funded eponymous Hybrid Open Access Dashboard project.

An early prototype of the application, including the interactive web frontend is available at https://subugoe.github.io/hoad/.

hoad's People

Contributors

ahobert avatar jhoeffler avatar katrinleinweber avatar maxheld83 avatar njahn82 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hoad's Issues

consider subsetting via crosstalk or similar instead of shiny

right now, i'm not 100% that what the dashboard currently does actually needs a shiny runtime.

I'm going to have to look into the current features of crosstalk etc. to see whether this can be done.

If we can get ~80% of what we want without a shiny runtime, that might be worth considering, because it would dramatically simplify deployment, cut costs and increase speed.

also get data for full open access

Not sure this is relevant, but as an open source fan but bibliometric noob, I wondered how the hybrid journals compared to full open access (and foss) journals.
Might be nice to see the trends over time in both and compare the magnitudes.

Maybe this isn't relevant or interesting to the expert audience.

check retrieval of crossref and openAPC match with random sample

also necessary for #51

aus dem Antrag:

die so gewonnenen Affiliationen werden danach mit der Liste der beitragenden Einrichtungen der Open APC Initiative abgeglichen. Die Güte des Retrievals wird mittels einer Zufallsstichprobe qualitativ überprüft. Die Arbeiten zur Datengewinnung und -kuration, insbesondere die Vereinheitlichung der Einrichtungsangaben, werden durch eine wissenschaftliche Hilfs- kraft unterstützt. Sollte das automatisierte Verfahren aufgrund heterogener Einrichtungsangaben sich in der Kalibrierungsphase als zeitlich zu aufwendig herausstellen, wird im Projektverlauf anstelle der projek- tierten Vollerhebung eine Zufallsstichprobe gebildet.

expose coverage of open APC cost info

aus dem Projektantrag:

Neben dem Fokus darauf, wie Verlage Handlungsempfehlungen zur Kostentransparenz umsetzen, wird auch der Abdeckungsgrad der Open APC Initiative analysiert. Zum einen ist es Aufgabe zu überprü- fen, wie umfassend Open APC das hybride Open-Access-Aufkommen ihrer teilnehmenden Einrichtun- gen abdeckt. Dies kann als Indikator dafür dienen, ob verteilte Zahlungsströme an diesen Einrichtungen existieren. Zum anderen sollen Einrichtungen identifiziert werden, die derzeit noch nicht an der Initiative teilnehmen.
Die Datenbasis bilden erneut diejenigen Open-Access-Artikel in subskriptionsbasierten Journalen, die zu Beginn des Arbeitsschritts mittels der Crossref-Lizenzinformationen gewonnen wurden. Es wird zunächst die Differenzmenge gebildet, um Artikel ohne Nachweis im Open-APC-Datensatz zu erhalten. Sie um- fasst laut gegenwärtigen Erkenntnissen aus unseren Vorarbeiten rund 82.000 Artikel. Anschließend wird ein automatisiertes Verfahren entwickelt, mit dem sich je Artikel die Kontaktdaten (Emailadresse) und die Einrichtungszugehörigkeit der Corresponding Authors über Crossref, oder, wenn nicht vorhanden, über den öffentlich verfügbaren Volltext extrahieren lassen. Die so gewonnenen Affiliationen werden danach mit der Liste der beitragenden Einrichtungen der Open APC Initiative abgeglichen. Die Güte des Retrievals wird mittels einer Zufallsstichprobe qualitativ überprüft. Die Arbeiten zur Datengewinnung und -kuration, insbesondere die Vereinheitlichung der Einrichtungsangaben, werden durch eine wissenschaftliche Hilfs- kraft unterstützt. Sollte das automatisierte Verfahren aufgrund heterogener Einrichtungsangaben sich in der Kalibrierungsphase als zeitlich zu aufwendig herausstellen, wird im Projektverlauf anstelle der projek- tierten Vollerhebung eine Zufallsstichprobe gebildet.

make data tidy

I am wondering whether the canonical object hybrid_df shouldn't be a list of several dfs, rather than one with duplication.

rename rproj file

just a small thing *.rproj currently has a different name than the repo, just trips me up when opening

Change "Institutional spending" caption

Minor issue: Maybe you should make more clear that the lower right plot only relates to the yellow OpenAPC share of the current selection, not the whole set of articles - one might draw some very wrong conclusions here.

Otherwise, a great project!

compare results with scopus database by kompetenzzentrum bibliometrie

aus dem Antrag:

In Form eines Dienstleistungsauftrag sollen die Ergebnisse der Analysen mit der Scopus-Datenbank des Kompetenzzentrum Bibliometrie abgeglichen werden. Im Mittelpunkt stehen Fragen nach der Indexierung der Journale in Scopus, des Publikations- typs, der Fachzuordnung, des Publikationsjahrs und der Einrichtungszugehörigkeit der Autorinnen und Autoren. Scopus wurde gewählt, weil die Datenbank im Vergleich zum Web of Science über eine breitere Abdeckung an Journalen verfügt.

Recommended usage of "start_date"

https://subugoe.github.io/hybrid_oa_dashboard/about.html states:

For being best represented in this dashboard, publishers will have to make sure to include license URL element license_ref and a start_date equal to the date of publication in the licensing metadata, which helps to identify open access journal content as well as to differentiate between immediate and delayed open access.

Crossref defined "start_date" as "optional", meaning that the license is applied to the published resource immediately if no start_date is provided, see https://github.com/CrossRef/rest-api-doc/blob/master/funder_kpi_metadata_best_practice.md#license-information. Is there really a need to expand this recommendation?

rename repo to hoad

this is just a really small thing, but hybrid_oa_dashboard is getting a bit long to type.

Would it be alright to rename the (nascent) package and repo hoad @njahn82?

All existing URLs etc. would remain active or be forwarded.

available::available('hoad') suggests it should be alright:

Name valid: ✔
Available on CRAN: ✔ 
Available on Bioconductor: ✔
Available on GitHub:  ✔ 
Abbreviations: http://www.abbreviations.com/hoad
Wikipedia: https://en.wikipedia.org/wiki/hoad
Wiktionary: https://en.wiktionary.org/wiki/hoad
Urban Dictionary:
  [heart] of a [dragon]
  http://hoad.urbanup.com/11027756
Sentiment:???

expose coverage of publication metadata from crossref data

aus dem Projektantrag:

Auf Grundlage des Datensatzes, werden zunächst die Crossref-Metadaten dahingehend untersucht, ob und inwieweit sie die in AP 1.1 ermittelten Berichtsdimensionen abdecken. Verlagsseitig ermöglicht das Crossref-Metadatenprofil Verlagen bereits jetzt, wichtige Informationen zur Gewährleistung der Kosten- transparenz zu dokumentieren. Zu ihnen gehören Lizenz- und Förderinformationen sowie Angaben zu Autorinnen und Autoren einschließlich der ORCID und ihrer Einrichtungszugehörigkeiten.
Es ist allerdings unklar, welcher Anteil an Journalen, die nachweislich Open-Access-Artikel nach dem hybriden Modell publizierten, die Möglichkeiten des Crossref-Metadatenprofils vollumfänglich und quali- tätsbewusst nutzen. Um diese Frage zu beantworten, werden im Arbeitsschritt die Abdeckungsraten für die relevanten Metadatenfelder erhoben. Für die Bewertung der Güte der Metadaten werden der Re- call (Wahrscheinlichkeit, mit der die entsprechenden Verlagsangaben über Crossref verfügbar sind) und die Precision (Wahrscheinlichkeit, mit der diese Informationen korrekt über Crossref abgebildet werden) ermittelt.

store historical data

this actually has a pretty high priority, because we don't want to loose any historical data.

separate out ETL and plotting functions

This is already largely the case, ETL is in R/, plotting in the dashboard.

I think the separation of concerns might go further here, by making functions for both.
ETL functions would then return a hybrid_df object, which plotting functions can, well, plot.

forward shinyapps url to https://subugoe.shinyapps.io/hoad/

if possible, I'd like to put in a simple redirect from the shinyapps.io url https://subugoe.shinyapps.io/hybridoa/ to https://subugoe.github.io/hybrid_oa_dashboard/.
On https://subugoe.github.io/hybrid_oa_dashboard/ the existing shinyapps.io instance would then be included as an iframe.

Advantages:

  • we can present everything from one URL which we control
  • if we ever leave shinyapps.io, no one will notice or need to change anything
  • the old shinyapps.io, under which this was first publicised, remains active
  • we can implement full-on branch deploys as per https://github.com/subugoe/hybrid_oa_dashboard/issues/56 easily

Journal dropdown list incomplete

There's an issue with the journal dropdown selection: The list gets cut off at 1000 entries (Happens when publisher is set to "all" or "Springer Nature").
To save you the first 100 meters of the rabbit hole: The reason is that selectInput utilizes a JS library called selectize.js in its default work mode and that library limits the select box options to 1000 per default ("maxOptions").

decide deployment strategy

This might be too early now, though deciding this sometime soon might help with some other decisions (especially reproducibility and testing).

If we end up using a shiny app (and not #24), I see three possibilites:

  1. RStudio Connect, hosted on-prem at GWDG.
    License is expensive, but has lots of good stuff, especially if we want to roll this out to external customers.
  2. shinyapps.io (status quo, easy to do, though somewhat limited reproducibility and no support for plumber, if needed).
  3. roll-your-own based on shiny server open source, maybe via shinyproxy

How to deal with multiple ISSNs when Crossref gives different record counts

It seems that 31 journals are affected:

issn journal_title publisher year records
0907-4449 Acta Crystallographica Section D Biological Crystallography International Union of Crystallography (IUCr) 2013 257
1399-0047 Acta Crystallographica Section D Biological Crystallography International Union of Crystallography (IUCr) 2013 158
0002-0729 Age and Ageing Oxford University Press (OUP) 2017 666
1468-2834 Age and Ageing Oxford University Press (OUP) 2017 667
0923-7534 Annals of Oncology Oxford University Press (OUP) 2016 3858
1569-8041 Annals of Oncology Oxford University Press (OUP) 2016 3859
1756-1833 BMJ BMJ 2016 4073
2059-8688 BMJ BMJ 2016 29
1756-1833 BMJ BMJ 2017 3341
2059-8688 BMJ BMJ 2017 35
1756-1833 BMJ BMJ 2018 910
2059-8688 BMJ BMJ 2018 13
0008-6363 Cardiovascular Research Oxford University Press (OUP) 2013 336
1755-3245 Cardiovascular Research Oxford University Press (OUP) 2013 241
0008-6363 Cardiovascular Research Oxford University Press (OUP) 2014 921
1755-3245 Cardiovascular Research Oxford University Press (OUP) 2014 801
1058-4838 Clinical Infectious Diseases Oxford University Press (OUP) 2015 1137
1537-6591 Clinical Infectious Diseases Oxford University Press (OUP) 2015 1140
1099-5129 EP Europace Oxford University Press (OUP) 2016 1115
1532-2092 EP Europace Oxford University Press (OUP) 2016 1116
1099-5129 EP Europace Oxford University Press (OUP) 2017 1681
1532-2092 EP Europace Oxford University Press (OUP) 2017 1682
0393-2990 European Journal of Epidemiology Springer Nature 2014 102
1573-7284 European Journal of Epidemiology Springer Nature 2014 109
0393-2990 European Journal of Epidemiology Springer Nature 2015 124
1573-7284 European Journal of Epidemiology Springer Nature 2015 126
1101-1262 European Journal of Public Health Oxford University Press (OUP) 2013 1023
1464-360X European Journal of Public Health Oxford University Press (OUP) 2013 1037
1101-1262 European Journal of Public Health Oxford University Press (OUP) 2014 1212
1464-360X European Journal of Public Health Oxford University Press (OUP) 2014 1240
1101-1262 European Journal of Public Health Oxford University Press (OUP) 2015 1651
1464-360X European Journal of Public Health Oxford University Press (OUP) 2015 1679
1101-1262 European Journal of Public Health Oxford University Press (OUP) 2016 1652
1464-360X European Journal of Public Health Oxford University Press (OUP) 2016 1680
0168-6496 FEMS Microbiology Ecology Oxford University Press (OUP) 2014 188
1574-6941 FEMS Microbiology Ecology Oxford University Press (OUP) 2014 35
0378-1097 FEMS Microbiology Letters Oxford University Press (OUP) 2014 300
1574-6968 FEMS Microbiology Letters Oxford University Press (OUP) 2014 20
0378-1097 FEMS Microbiology Letters Oxford University Press (OUP) 2015 11
1574-6968 FEMS Microbiology Letters Oxford University Press (OUP) 2015 290
1567-1356 FEMS Yeast Research Oxford University Press (OUP) 2015 39
1567-1364 FEMS Yeast Research Oxford University Press (OUP) 2015 128
1567-1356 FEMS Yeast Research Oxford University Press (OUP) 2016 1
1567-1364 FEMS Yeast Research Oxford University Press (OUP) 2016 113
0091-7613 Geology Geological Society of America 2017 310
1943-2682 Geology Geological Society of America 2017 159
0964-6906 Human Molecular Genetics Oxford University Press (OUP) 2017 458
1460-2083 Human Molecular Genetics Oxford University Press (OUP) 2017 466
0268-1161 Human Reproduction Oxford University Press (OUP) 2017 296
1460-2350 Human Reproduction Oxford University Press (OUP) 2017 300
1876-3405 International Health Oxford University Press (OUP) 2017 46
1876-3413 International Health Oxford University Press (OUP) 2017 45
0300-5771 International Journal of Epidemiology Oxford University Press (OUP) 2015 1234
1464-3685 International Journal of Epidemiology Oxford University Press (OUP) 2015 1236
0027-8874 JNCI: Journal of the National Cancer Institute Oxford University Press (OUP) 2016 300
1460-2105 JNCI: Journal of the National Cancer Institute Oxford University Press (OUP) 2016 301
0305-7453 Journal of Antimicrobial Chemotherapy Oxford University Press (OUP) 2015 489
1460-2091 Journal of Antimicrobial Chemotherapy Oxford University Press (OUP) 2015 490
0021-8898 Journal of Applied Crystallography International Union of Crystallography (IUCr) 2013 239
1600-5767 Journal of Applied Crystallography International Union of Crystallography (IUCr) 2013 20
0022-1899 Journal of Infectious Diseases Oxford University Press (OUP) 2015 737
1537-6613 Journal of Infectious Diseases Oxford University Press (OUP) 2015 743
0022-1899 Journal of Infectious Diseases Oxford University Press (OUP) 2017 724
1537-6613 Journal of Infectious Diseases Oxford University Press (OUP) 2017 725
1741-3842 Journal of Public Health Oxford University Press (OUP) 2013 135
2198-1833 Journal of Public Health Springer Nature 2013 18
1741-3842 Journal of Public Health Oxford University Press (OUP) 2014 116
2198-1833 Journal of Public Health Springer Nature 2014 47
1741-3842 Journal of Public Health Oxford University Press (OUP) 2015 196
2198-1833 Journal of Public Health Springer Nature 2015 53
1741-3842 Journal of Public Health Oxford University Press (OUP) 2016 164
2198-1833 Journal of Public Health Springer Nature 2016 80
1741-3842 Journal of Public Health Oxford University Press (OUP) 2017 180
2198-1833 Journal of Public Health Springer Nature 2017 95
1741-3842 Journal of Public Health Oxford University Press (OUP) 2018 88
2198-1833 Journal of Public Health Springer Nature 2018 35
1072-0502 Learning & Memory Cold Spring Harbor Laboratory 2014 38
1549-5485 Learning & Memory Cold Spring Harbor Laboratory 2014 86
1072-0502 Learning & Memory Cold Spring Harbor Laboratory 2015 7
1549-5485 Learning & Memory Cold Spring Harbor Laboratory 2015 73
1360-9947 Molecular Human Reproduction Oxford University Press (OUP) 2017 68
1460-2407 Molecular Human Reproduction Oxford University Press (OUP) 2017 71
1360-9947 Molecular Human Reproduction Oxford University Press (OUP) 2018 12
1460-2407 Molecular Human Reproduction Oxford University Press (OUP) 2018 23
0035-8711 Monthly Notices of the Royal Astronomical Society Oxford University Press (OUP) 2018 936
1365-2966 Monthly Notices of the Royal Astronomical Society Oxford University Press (OUP) 2018 998
0963-0252 Plasma Sources Science and Technology IOP Publishing 2016 113
1361-6595 Plasma Sources Science and Technology IOP Publishing 2016 216
0963-0252 Plasma Sources Science and Technology IOP Publishing 2017 1
1361-6595 Plasma Sources Science and Technology IOP Publishing 2017 185
0963-0252 Plasma Sources Science and Technology IOP Publishing 2018 19
1361-6595 Plasma Sources Science and Technology IOP Publishing 2018 84
1741-0126 Protein Engineering Design and Selection Oxford University Press (OUP) 2017 66
1741-0134 Protein Engineering Design and Selection Oxford University Press (OUP) 2017 67
1462-0324 Rheumatology Oxford University Press (OUP) 2014 1067
1462-0332 Rheumatology Oxford University Press (OUP) 2014 1133
1462-0324 Rheumatology Oxford University Press (OUP) 2015 365
1462-0332 Rheumatology Oxford University Press (OUP) 2015 1061
1462-0324 Rheumatology Oxford University Press (OUP) 2016 367
1462-0332 Rheumatology Oxford University Press (OUP) 2016 1061
2041-8205 The Astrophysical Journal IOP Publishing 2014 374
2041-8213 The Astrophysical Journal IOP Publishing 2014 673
0035-9203 Transactions of The Royal Society of Tropical Medicine and Hygiene Oxford University Press (OUP) 2017 95
1878-3503 Transactions of The Royal Society of Tropical Medicine and Hygiene Oxford University Press (OUP) 2017 96
2194-4946 Zeitschrift für Kristallographie - Crystalline Materials Walter de Gruyter GmbH 2013 102
2196-7105 Zeitschrift für Kristallographie - Crystalline Materials Walter de Gruyter GmbH 2013 49

replace data dumps with db queries

I am wondering whether we should, as far as possible, replace the database dumps (i.e. *.json or *.csv) with (properly cached) queries.
I think this might aid reproducibility and make the whole thing more scalable, and automatically updated etc.

I don't know nearly enough about the license, auth and rate limit of upstream data sources, but if necessary, with a proper cache or even our own data base, this should be possible.

We could then expose a hybrid_df() function, which returns a tibble (or list of tibbles) from the query.
That would also be a nice place to document all of the details of the fields etc.

We might then, in turn, expose this ETLd query as a plumber API, which we could a) use ourselves in the dashboard and b) other people could use, even if they don't necessarily like or use R.

This would be a nice design where we fully separate the concerns between ETL and analysis/plotting, etc. both in terms of functions and infrastructure. (see #43).

The ETL functions would then drive the plumber API.

The plumbing and analysis functions would drive the dashboard.

Unpaywall integration

Unpaywall Data is a great and popular source to identify hybrid open access journal articles. Although Unpaywall also makes heavy use of Crossref, methods and results differ. Unpaywall checks not only Crossref, but also publisher websites directly to find OA articles with open licenses. Contrary to our approach, Unpaywall Data does not indicate when an article was made OA with an open content license. Consequently, we cannot distinguish between immediate and delayed OA provided by toll-access journals using Unpaywall Data alone. However, because not all publishers share license metadata with Crossref and because some journals make articles available after a certain period like the Journal of the American College of Cardiology, we thought that integrating Unpaywall Data is useful to explore the extent of OA articles that is currently provided by toll-access journals in our sample.

We used the most recent Unpaywall data dump from September 2018. R/unpaywall_integration.R shows how we queried the dump, which we stored on Google Big Query before, for OA journal articles where Unpaywall found license information. The script also shows how the indicator jn_y_unpaywall_others was calculated, which is now present in our dataset data/hybrid_publications.csv. Results are presented on the page "Overview", tab "Other types of OA license information detected by Unpaywall".

First draft live with commit subugoe/hybrid_oa_dashboard@b53d851 . We need to add this to the long-form and data documentation.

Plant Biotechnology Journal (Wiley) was not hybrid in 2017

I think PBJ switched from closed to open in 2017:

"Welcome to the first issue of the fifteenth volume of Plant Biotechnology Journal. I would like to start this editorial by announcing the successful transition of PBJ from a subscription‐based journal to an open access journal supported exclusively by authors. This resulted in enhanced free global access to all readers." -- https://doi.org/10.1111/pbi.12687

I'm thus surprised to see that it's in the hybridOA table with 187 articles out of 202 in 2017 (92.57%) for the year. Could this be an error such as including 2016 publications, which might account for the 15 not OA articles? It just seems odd to me.

OpenAPC shares larger than crossref results

Just for clarification: There are cases where the article count in OpenAPC is larger than the count reported by the monitor (Wiley-Blackwell 2014/15 or SpringerNature 2015). This means that your API calls could not identfiy those missing articles as hybrid OA in crossref because they are not correctly tagged with a CC license in license_ref, correct?

Offsetting collection may also include fully oa journals

Observations:

institution period euro doi is_hybrid publisher journal_full_title issn issn_print issn_electronic issn_l license_ref indexed_in_crossref pmid pmcid ut url doaj hybrid_type country country_name
MPG 2017 NA 10.1080/2055074x.2017.1287535 FALSE Informa UK Limited Catalysis, Structure & Reactivity 2055-074X 2055-074X 2055-0758 2055-0758 NA TRUE NA NA ut:000411945400013 NA FALSE Open APC (Offsetting) DEU Germany
MPG 2017 NA 10.1080/23802359.2017.1289349 FALSE Informa UK Limited Mitochondrial DNA Part B 2380-2359 NA 2380-2359 2380-2359 NA TRUE NA NA ut:000402042700049 NA FALSE Open APC (Offsetting) DEU Germany

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.