Giter Club home page Giter Club logo

pyorcidator's Issues

Avoid code repetition between import_info and import_info_from_list

Since the end result between the modules is essentially the same, the functionality desired - to read from a list of ORCIDs - could be achieved by having a type guard on import_info.py, to see if the argument provided is a Path() that exists - thereby reading it and looping through them - or a string - in such case, only sending it to render_orcid_qs. I believe it would make the code simpler and cleaner.

Add logic to look for disambiguations on Wikidata

e.g. :

"organization": {
                        "name": "University of Regensburg",
                        "address": {
                            "city": "Regensburg",
                            "region": "Bayern",
                            "country": "DE"
                        },
                        "disambiguated-organization": {
                            "disambiguated-organization-identifier": "grid.7727.5",
                            "disambiguation-source": "GRID"
                        }
                    }

code should look up id on Wikidata before asking user for key

get_organization_list not returning full list of organizations

In the branch tests/get_org_list_bug I built a simple test to check if helper.get_organization_list could return a list of all organizations in the sample data. However, it currently only returns 2 of the 4 organizations (Output of pytest -v):

E       AssertionError: assert ['Harvard Med...l', 'Q152171'] == ['Harvard Med...unhofer SCAI']
E         At index 1 diff: 'Q152171' != 'Enveda Biosciences'
E         Right contains 2 more items, first extra item: 'Q152171'
E         Full diff:
E         - ['Harvard Medical School', 'Enveda Biosciences', 'Q152171', 'Fraunhofer SCAI']
E         + ['Harvard Medical School', 'Q152171']

As you can see, the function only returns the first and the third employment entries.
I believe this piece of code is the reason:

if a["disambiguated-organization"] is None:
    continue

If it doesn't find a key for disambiguated-organization - which is the case for the second and fourth entries I showed above -, it jumps to the next entry in the list, instead of returning it at the end. So, what's the reason behind this line? Is there something I'm missing here? Thanks

Think about implementing a test suite

I believe testing is a good thing to work on going forward, since the base functionality of the code is - more or less - achieved. As we go through implementing new features and what not, it'd be a nice safety net to have an adjacent testing suite, so we don't break the basics. I'd recommend going with pytest, but Python's own testing module, unittest is pretty good too.

Improve handling and skipping when term not present in Wikidata

Sometimes a term is not on Wikidata and we'd rather just exclude it instead of going through the effort of creating a new entry.

Not sure about how to implement:

  • special dict value ("SKIP") that is skipped?
  • flag that skips any missing when running import?

Enable creating Wikidata record for a given ORCID non-interactively

While the goal of this repo is to have a human-in-the-loop to help disambiguate affiliations and other aspects of ORCID pages, it would be nice to have a non-interactive mode that allows for a given ORCID to just do a simple set of things (instance of, occupation, ORCID annotation) and some other high-confidence resolution of affiliations.

Related to cthoyt/orcidio#7

403 error when using import or import_list

Sometimes when we run pyorcidator import or import_list, it raises an HTTP 403 Forbidden error:

File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

The stacktrace appears to lead to SPARQLWrapper in the helper.lookup_id function.

ModuleNotFoundError: No module named 'pyorcidator'

@cthoyt I tried to install pyorcidator as a package by running pip3 install . in a virtual env. When I try to call the function, it gives the following error:

Traceback (most recent call last):
  File "/home/lubianat/Documents/main_venv/bin/pyorcidator", line 5, in <module>
    from pyorcidator.cli import cli
ModuleNotFoundError: No module named 'pyorcidator'

it is installed, though, as it is in /home/lubianat/Documents/main_venv/bin/pyorcidator, do you know what might be happening?

Split QuickStatements datamodel into own package?

Hi @lubianat @jvfe, would it be alright if I split the code I wrote for the QuickStatements data model into its own stand-alone package? I also want to write a full-fledged client for interacting with the QuickStatements API, but this isn't the core goal of this package.

I'll keep compatibility with the current interface, so later, it would be possible to replace code in this repository with that code

Pre-populate degree dictionary

Using a SPARQL query to get all subclasses of academic title (Q3529618) would be a nice way to pre-populate degrees.json. The following SPARQL query (run at https://w.wiki/5o9H) gets the job done:

SELECT ?itemLabel ?item
WHERE {
  ?item wdt:P279* wd:Q3529618 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Caveats:

  • This should be extended to multiple languages
  • Some labels are empty, those should be filtered out either in SPARQL or in post-processing (I realize this was likely due to there not being english labels)
  • There might be other terms besides academic title that are relevant, but this seems like a pretty good start

Alternate Multi-lingual SPARQL

SELECT DISTINCT ?label ?item
WHERE {
  ?item wdt:P279* wd:Q3529618 .
  ?item rdfs:label ?label .
}

Note that DISTINCT doesn't collapse entries tagged with multiple languages, but still have the same text.

Implement data model for quickstatements

Right now, the construction of the quickstatements text is really hard to understand, and therefore to debug or extend. I would suggest creating a data model (i.e., a set of interconnected classes) that can better assist with the construction of quickstatements in a programmatic way, then can also implement serialization to text

Describe in the README what it actually does

For example: does it add missing papers? is that optional? does it try to figure out links to website provided on the profile and extract, for example, Twitter account, GScholar account? does it link the (new) ORCID profile to existing articles?

Include black in the CI

Add CI check to see if the code is properly formatted with black

@jvfe about time we blackened the whole project ;)

For sure, I'll probably include black in the CI too, just to be safe. But that's for another PR.

Originally posted by @jvfe in #41 (comment)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.