lubianat / pyorcidator Goto Github PK

View Code? Open in Web Editor NEW

8.0 3.0 5.0 646 KB

License: MIT License

Python 100.00%

hacktoberfest orcid wikidata

pyorcidator's Issues

change the project settings to enforce squash/merge and also auto-delete branches after merge

@cthoyt

Avoid code repetition between import_info and import_info_from_list

Since the end result between the modules is essentially the same, the functionality desired - to read from a list of ORCIDs - could be achieved by having a type guard on import_info.py, to see if the argument provided is a Path() that exists - thereby reading it and looping through them - or a string - in such case, only sending it to render_orcid_qs. I believe it would make the code simpler and cleaner.

Improve handling of external links when 'https' not present

Currently, if an ORCID user doesn't have 'https://' in their social links tab the link isn't recognized by pyorcidator. For example, the github link for https://orcid.org/0000-0001-7542-0286 (github.com/egonw).

This could be probably solved by using a regex instead of the current rv[key] = url[len(url_prefix) :]. I might send a possible solution soon.

Add logic to look for disambiguations on Wikidata

e.g. :

"organization": {
                        "name": "University of Regensburg",
                        "address": {
                            "city": "Regensburg",
                            "region": "Bayern",
                            "country": "DE"
                        },
                        "disambiguated-organization": {
                            "disambiguated-organization-identifier": "grid.7727.5",
                            "disambiguation-source": "GRID"
                        }
                    }

code should look up id on Wikidata before asking user for key

Adapt code to accept multiple degrees of the same institution

BUG: Fix date precision for start and end times

Currently precision for months is not set correctly.
See https://www.wikidata.org/w/index.php?title=Q90094381&oldid=1750716604 and https://orcid.org/0000-0002-6500-856X .

It is being added as the first day of the month, a false precision.

Outsource code for basic Wikidata curation/parsing

I've create a package for basic Wikidata curation tools:

https://github.com/lubianat/wdcuration

I use them across various projects. I'll adapt pyorcidator to use functions from that package, but I am a bit unsure on how to best do that.

Perhaps I'll push a version of wdcuration to PyPi and add it as a requirement. Anyone has thoughts on that?

get_organization_list not returning full list of organizations

In the branch tests/get_org_list_bug I built a simple test to check if helper.get_organization_list could return a list of all organizations in the sample data. However, it currently only returns 2 of the 4 organizations (Output of pytest -v):

E       AssertionError: assert ['Harvard Med...l', 'Q152171'] == ['Harvard Med...unhofer SCAI']
E         At index 1 diff: 'Q152171' != 'Enveda Biosciences'
E         Right contains 2 more items, first extra item: 'Q152171'
E         Full diff:
E         - ['Harvard Medical School', 'Enveda Biosciences', 'Q152171', 'Fraunhofer SCAI']
E         + ['Harvard Medical School', 'Q152171']

As you can see, the function only returns the first and the third employment entries.
I believe this piece of code is the reason:

if a["disambiguated-organization"] is None:
    continue

If it doesn't find a key for disambiguated-organization - which is the case for the second and fourth entries I showed above -, it jumps to the next entry in the list, instead of returning it at the end. So, what's the reason behind this line? Is there something I'm missing here? Thanks

Add keywords as "field of work" statements

e.g https://orcid.org/0000-0002-6049-9865

BUG Add start and end time as qualifiers (and not references)

@cthoyt @jvfe bug on the approved code and tests. Start and end times need to be qualifiers, not refs.

Get dates for employment

Dates are only parsed for education now.

Automate pyORCIDator to run for a list for ORCIDS

Add dictionaries for first and last names (and infer gender)

e.g. Rodrigo Dalmolin

Rodrigo - https://www.wikidata.org/wiki/Q4927979 (male given name)
Dalmolin - https://www.wikidata.org/wiki/Q37464573 (family name)

From male given name, infer sex or gender (P21) --> male

Watchout for:

names used for multiple genders (and non-binary names): "Alex", "Andrea" etc

Think about implementing a test suite

I believe testing is a good thing to work on going forward, since the base functionality of the code is - more or less - achieved. As we go through implementing new features and what not, it'd be a nice safety net to have an adjacent testing suite, so we don't break the basics. I'd recommend going with pytest, but Python's own testing module, unittest is pretty good too.

Improve handling and skipping when term not present in Wikidata

Sometimes a term is not on Wikidata and we'd rather just exclude it instead of going through the effort of creating a new entry.

Not sure about how to implement:

special dict value ("SKIP") that is skipped?
flag that skips any missing when running import?

Enable creating Wikidata record for a given ORCID non-interactively

While the goal of this repo is to have a human-in-the-loop to help disambiguate affiliations and other aspects of ORCID pages, it would be nice to have a non-interactive mode that allows for a given ORCID to just do a simple set of things (instance of, occupation, ORCID annotation) and some other high-confidence resolution of affiliations.

Related to cthoyt/orcidio#7

Connect authors to the articles they authored

Connect to existing articles
Create items for the ones that do not exist yet

403 error when using import or import_list

Sometimes when we run pyorcidator import or import_list, it raises an HTTP 403 Forbidden error:

File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

The stacktrace appears to lead to SPARQLWrapper in the helper.lookup_id function.

Open QS instantly for individuals

Parse websites and social links

e.g. see https://orcid.org/0000-0002-5292-4083

Pull DOIs from ORCID

ModuleNotFoundError: No module named 'pyorcidator'

@cthoyt I tried to install pyorcidator as a package by running pip3 install . in a virtual env. When I try to call the function, it gives the following error:

Traceback (most recent call last):
  File "/home/lubianat/Documents/main_venv/bin/pyorcidator", line 5, in <module>
    from pyorcidator.cli import cli
ModuleNotFoundError: No module named 'pyorcidator'

it is installed, though, as it is in /home/lubianat/Documents/main_venv/bin/pyorcidator, do you know what might be happening?

Develop a workflow for curation and update of the dictionaries for roles and institutions

Currently I'm commiting this updates directly on master. Probably not a good solution.

Maybe each curator using pyorcidator can have their own branch just for curation and they are merged e.g. once a month?

Split QuickStatements datamodel into own package?

Hi @lubianat @jvfe, would it be alright if I split the code I wrote for the QuickStatements data model into its own stand-alone package? I also want to write a full-fledged client for interacting with the QuickStatements API, but this isn't the core goal of this package.

I'll keep compatibility with the current interface, so later, it would be possible to replace code in this repository with that code

Pre-populate degree dictionary

Using a SPARQL query to get all subclasses of academic title (Q3529618) would be a nice way to pre-populate degrees.json. The following SPARQL query (run at https://w.wiki/5o9H) gets the job done:

SELECT ?itemLabel ?item
WHERE {
  ?item wdt:P279* wd:Q3529618 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Caveats:

This should be extended to multiple languages
Some labels are empty, those should be filtered out either in SPARQL or in post-processing (I realize this was likely due to there not being english labels)
There might be other terms besides academic title that are relevant, but this seems like a pretty good start

Alternate Multi-lingual SPARQL

SELECT DISTINCT ?label ?item
WHERE {
  ?item wdt:P279* wd:Q3529618 .
  ?item rdfs:label ?label .
}

Note that DISTINCT doesn't collapse entries tagged with multiple languages, but still have the same text.

Implement data model for quickstatements

Right now, the construction of the quickstatements text is really hard to understand, and therefore to debug or extend. I would suggest creating a data model (i.e., a set of interconnected classes) that can better assist with the construction of quickstatements in a programmatic way, then can also implement serialization to text

Parse education into quickstatements

[] Create dict for role titles, ie

"role-title": "PhD"

-[] check how it is modelled on Wikidata

@jvfe about time we blackened the whole project ;)

For sure, I'll probably include black in the CI too, just to be safe. But that's for another PR.

Originally posted by @jvfe in #41 (comment)

lubianat / pyorcidator Goto Github PK

pyorcidator's Issues

Alternate Multi-lingual SPARQL

Recommend Projects

Recommend Topics

Recommend Org