Giter Club home page Giter Club logo

Comments (7)

broeder-j avatar broeder-j commented on May 19, 2024

Yes, it does: https://info.orcid.org/faq/how-do-i-find-orcid-record-holders-at-my-institution/
BUT (this is what I figured could be wrong): emails of users are per default not visible to the outside, a member has to upgrade this to either internal or public on a per email level. So only if people have done this you have a chance to find them via an authorized query to the API by email. I think most people do not change the default, so i expect this way to yield 10%. (test query https://pub.orcid.org/v3.0/csv-search/?q=affiliation-org-name:ORCID&fl=orcid,given-names,family-name,current-institution-affiliation-name,email)

A better way could be to find people over name, plus affiliation, i.e. institution name or identifier.
Here codemetapy probably only has a chance if the institution is given or it can get it from the metadata already there...
How to do this I do not know, since contributors can be from everywhere, maybe a first thing would be to allow for a list to try.

Let me know if you plan to work on this. I have a layout of what I want, but not implemented anything yet and it is currently not on my todo list

in terms of code out there I found this which is old and may or may not work:
https://github.com/ORCID/python-orcid
https://github.com/scholrly/orcid-python

from codemetapy.

proycon avatar proycon commented on May 19, 2024

emails of users are per default not visible to the outside, a member has to upgrade this to either internal or public on a per email > level. So only if people have done this you have a chance to find them via an authorized query to the API by email. I think most > people do not change the default, so i expect this way to yield 10%.

Too bad, this would be the ideal method but if it yields only 10% it's not very useful indeed.

A better way could be to find people over name, plus affiliation, i.e. institution name or identifier.

That sounds viable yes, though one issue with affiliations is that people tend to come and go in institutions.

..maybe a first thing would be to allow for a list to try.

Like explicitly passing a tsv file to codemetapy with say emails and orcids? That would work yes, though it isn't as fully automated as we'd want ideally.

from codemetapy.

broeder-j avatar broeder-j commented on May 19, 2024

An add on to this. codemetapy parses the Citation.cff file, but it does not use the orcids in there for authors/contributors Ids but instead the gitlab id (account page) "@id": "https://iffgit.fz-juelich.de/fleur/fleur/person/cmax347".

Ideally once would keep both information... i.e that the orcid and the git id are same as somewhere.

also in that context the familyName and givenName parsing is also not optimal if the link of the person does not contain the name, example:

       {
            "@id": "https://iffgit.fz-juelich.de/fleur/fleur/person/cmax347",
            "@type": "Person",
            "email": "[email protected]@gmail.com",
            "familyName": "",
            "givenName": "cMax347",
            "position": 71
        },
        {
            "@id": "https://iffgit.fz-juelich.de/fleur/fleur/person/christian-roman-gerhorst",
            "@type": "Person",
            "email": "[email protected]",
            "familyName": "Gerhorst",
            "givenName": "Christian-Roman",
            "position": 72
        }

So it has also problems with middle names. I would assume that these would be easier to parse from an Citation.cff file.

from codemetapy.

proycon avatar proycon commented on May 19, 2024

An add on to this. codemetapy parses the Citation.cff file, but it does not use the orcids in there for authors/contributors Ids

Hmm.. Agreed, if there are ORCIDs then they shouldn't be overwritten. I wonder if it's an issue in codemetapy or in https://github.com/citation-file-format/cff-converter-python, we don't do the CITATION.cff parsing ourselves.

but instead the gitlab id (account page) "@id": "https://iffgit.fz-juelich.de/fleur/fleur/person/cmax347".

(it's not the gitlab id, see #34)

Ideally once would keep both information... i.e that the orcid and the git id are same as somewhere.

from codemetapy.

proycon avatar proycon commented on May 19, 2024

also in that context the familyName and givenName parsing is also not optimal if the link of the person does not contain the name, example:

  {
       "@id": "https://iffgit.fz-juelich.de/fleur/fleur/person/cmax347",
       "@type": "Person",
       "email": "[email protected]@gmail.com",
       "familyName": "",
       "givenName": "cMax347",
       "position": 71
   },

Yes, we'd better just use schema:name if we can't decipher given and family names, needs some fine-tuning. That e-mail looks malformed too.
For the actual name parsing from arbitrary strings I'm using nameparser

from codemetapy.

proycon avatar proycon commented on May 19, 2024

I've been giving this some more thought and there are some challenges to solve, mostly related to 'affiliations':

  1. In the current implementation, whenever an author appears in multiple
    software metadata projects (or even multiple times in the same one), there
    is a high risk of properties getting conflated if not consistently named.
    The most notable one is 'affiliation'. If an author at various points has different
    affiliations (or even the same one but not consistently named). Then these will all
    be propagated to all instances when the full graph of multiple software projects is loaded.
  2. Related to the above: 'affiliation' is a property of a schema:Person. But
    that means it is no longer attached to any specific software project,
    meaning we can't differentiate between affiliations at the time of the
    sofware project or later/before. We'd always get all of them, which may be
    less informative than desired. It's common for people to have (had) multiple
    affiliations throughout their career. We do use schema:producer to tie
    software projects to institutions directly, so at least that is expressable
    (relates to codemeta/codemeta#286)
  3. We already ascertained that automatically going from names or e-mails to
    ORCIDs is hard. We probably need a custom mapping as input (like a tsv
    file).
  4. The reverse, going from ORCIDs to all the names/emails/urls is fairly easy, we can
    just query orcid.org and request application/ld+json to get a schema.org
    representation that is compatible with codemeta. Some caveats there:
    * It does not contain the e-mail, even if it is public. The turtle
    output, however, does (it uses a completely different vocabulary than
    the JSON-LD serialisation)
    * The JSON-LD output lists all affiliations it knows (including those
    that have ended, but that information is not outputted). The turtle
    output lists no affiliations at all.

from codemetapy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.