Giter Club home page Giter Club logo

metacheck's Introduction

Main Codecov test coverage CRAN status Lifecycle: experimental

Open Access Metadata Compliance Checker

Automatically check metadata compliance for funded open access articles.

Check your own DOIs Look at an example

Why Metadata Compliance Matters

Open access grants and transformation contracts with publishers increasingly require licensing metadata.

General metadata recommendations:

Supporting Libraries and OA Funders

The compliance checker helps libraries and funders of open access publications streamline their metadata monitoring.

  • Become Metadata Compliant

    Independently verify that your publications are compliant with the metadata requirements of library consortia and funding agencies.
  • Identify Areas of Improvement

    Identify nonconforming metadata and target the publishers and publications to best improve your compliance.
  • Receive Emailed Reports

    Get high-quality reports parametrised for your institution and funded publications by email.
  • Dig Down into your Data

    Use generated spreadsheets with your own tools and analyses.
  • Get Answers

    Rely on the community of users and experts to interpret results and troubleshoot concompliant or missing metadata.
    FAQs are coming soon.
  • Use Open Source for Open Access

    Co-create value for the community by using a compliance tool powered by open source software and based on open data.
    Contributions are welcome.

Technical Implementation

The Open Access Metadata Compliance Checker is powered by metacheck, an R package.

The package includes:

  • compliance checks
  • a parametrised rmarkdown compliance report
  • a webapp to send e-mail reports

metacheck's People

Contributors

maxheld83 avatar njahn82 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

maxheld83

metacheck's Issues

retrieve email send status from smtp relay

currently, we don't really hear back from what the smtp relay (mailjet currently) does with the email, i.e. when/if it is actually sent/bounced or whatever.

AFAIK, what we currently hear from mailjet is merely that mailjet has received our request.

It would be nice to have access to that and use that status:

  • in testing
  • in the UI (i.e. tell users when their email has actually been sent by the smtp)

This kind of bit me just now when mailjet seemed to be working through a queue for about 20 mins, sending no email (couldn't find them in gmail or gwdg) and then a bunch came all at once.
This kind of thing could/should be tested if we get a response.

use separate API tokens

Aside from security best practices, there may be another reason to use separate API tokens for separate people and, especially, services: rate limits.

Crossref states for Metadata Plus that:

Rate limiting of the API is primarily on a per access token basis. If a method allows, for example, for 75 requests per rate limit window, then it allows 75 requests per window per access token. This number can depend on the system state and may need to change. If it does, Crossref will publish it in the response headers.

The problem with us (people) and several services (Azure, gha) sharing the same token is that one of these users might (accidentally) exhaust the rate limit at the expense of another user or machine.
This can easily happen, because it is generally ok to go up to the rate limit on any individual machine or service.
As a result, seemingly unrelated services or other users queries may break in an intermittent fashion, which could be quite surprising and hard to debug.

This is unlikely to be an issue initially, but may well become one eventually and should be addressed head on with at least a token per user and service.

Depending on our scaling Azure may even need several tokens, or the shiny app must ask Azure how many instances there currently are and then share/divide the rate limit accordingly.

As a (slow) workaround, falling back to the open api might help #36

add test where runMetacheck is run inside of runtime container

haha, this was a fun one just now solved in https://github.com/subugoe/metacheck/actions/runs/413556533.

There was apparently (?) a missing dependency, which never came up before because:

  • it was only needed when there were missing DOIs, which did not come up in much of the testing
  • the dependency (gluedown) is probably part of muggle-buildtime anyway (?), so it never came up inside ghactions testing
  • inside azure, muggle-runtime probably did not have this, and that's why it failed.

This is a pretty bad failure mode.
Fixes:

  • also run (smoke?) test on muggle-runtime in github actions (though even the entrypoint wouldn't have captured that?!)
  • don't use any dependencies in the rmarkdown template, because forgetting those doesn't get caught in R CMD check (as per #52).

deploy to google cloud run (or similar batch service)

since the creation and sending of the email is actually stateless (i.e. we do not need a continuous connection to the client), we can actually do 95% of this on a (much cheaper/easier) service such as google cloud run.
This also works with the muggle containers, so there should be very little extra DevOps work.

internationalise landing page

the whole multilingual thing is a bit of a mess currently.
There is, for example, currently no easy way to have a landing page in german without english artefacts.
The sites are also not properly reported as "translations" of other sites.

It might actually just be easiest to do this all in english.

render asynchronously

rendering the email and other expensive operations should work synchronously so as not to block the session

refactor ELT + analysis codebase

this is a bit of a big issue, but just to keep track of what should be factored out:

  • the rmarkdown template (for the mail) strictly should not do any work, but should merely define the presentation of results (i.e. the order of tables and perhaps some commentary).
    • Among other things, the rmarkdown template currently creates the excel output, which makes this quite brittle (and required the hack-fix for #44
    • the substantive documentation in the template (i.e. what this or that means, not just "hello"-boilerplate) should be in the roxgen2 documentation for the respective functions.
      We can then figure out a way to dynamically pull this into the rmarkdown template.
      (I know that it's easily possible to pull in random rmarkdown chunks into the roxygen docs, but it would be even more expressive to do it the other way around -- will have to investigate).
      The logic here is that this kind of info really belongs to the functions, where it can be maintained together, tested, etc.
    • no dependencies in rmarkdown templates, in particular, or at least not without programmatically rendereing the template as part of the check.
      If there is a missing dependency as in https://github.com/subugoe/metacheck/actions/runs/413556533 this creates a pretty thorny failure mode, because the missing dep isn't captured by R CMD check.
  • simply no r code that isn't a function and isn't covered by at least R CMD check and friends, ideally test() as well.
    tbc.

add legal disclaimer

@njahn82 some (IANAL) thoughts on the legal issues just raised:

  1. I'd hope that we can somehow rely on the MIT license (which the package is also under) in using the web UI.
    That expressly excludes any kind of warranty or guarantees, and has been thoroughly tested to that effect.
  2. If we can retrieve and store the RORs for all/most users, we might drop the email as soon as we've send out the email.
    I'd imagine that in that situation, we'd only store ROR and DOIs, neither of which are personal information, and thus perhaps, the GDPR regime wouldn't even apply.

change name

Can we change the name @njahn82? (Sorry, bit of a pet peeve).

This covers non-hybrid publications as well, correct?

Would just metacheck work? It's short, available::available() everywhere, and covers the core of what it's doing.

I know this isn't super important, but it's easiest to do this at the beginning, before we have to change it in a bunch of places.

pass on secrets to azure

azure secrets are currently manually pasted into portal.azure.com; that's not ideal.
Might look into azure vault or something similar.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.