lubianat / pyorcidator Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Since the end result between the modules is essentially the same, the functionality desired - to read from a list of ORCIDs - could be achieved by having a type guard on import_info.py, to see if the argument provided is a Path() that exists - thereby reading it and looping through them - or a string - in such case, only sending it to render_orcid_qs. I believe it would make the code simpler and cleaner.
Currently, if an ORCID user doesn't have 'https://' in their social links tab the link isn't recognized by pyorcidator. For example, the github link for https://orcid.org/0000-0001-7542-0286 (github.com/egonw).
This could be probably solved by using a regex instead of the current rv[key] = url[len(url_prefix) :]
. I might send a possible solution soon.
e.g. :
"organization": {
"name": "University of Regensburg",
"address": {
"city": "Regensburg",
"region": "Bayern",
"country": "DE"
},
"disambiguated-organization": {
"disambiguated-organization-identifier": "grid.7727.5",
"disambiguation-source": "GRID"
}
}
code should look up id on Wikidata before asking user for key
Currently precision for months is not set correctly.
See https://www.wikidata.org/w/index.php?title=Q90094381&oldid=1750716604 and https://orcid.org/0000-0002-6500-856X .
It is being added as the first day of the month, a false precision.
I've create a package for basic Wikidata curation tools:
I use them across various projects. I'll adapt pyorcidator to use functions from that package, but I am a bit unsure on how to best do that.
Perhaps I'll push a version of wdcuration to PyPi and add it as a requirement. Anyone has thoughts on that?
In the branch tests/get_org_list_bug I built a simple test to check if helper.get_organization_list could return a list of all organizations in the sample data. However, it currently only returns 2 of the 4 organizations (Output of pytest -v
):
E AssertionError: assert ['Harvard Med...l', 'Q152171'] == ['Harvard Med...unhofer SCAI']
E At index 1 diff: 'Q152171' != 'Enveda Biosciences'
E Right contains 2 more items, first extra item: 'Q152171'
E Full diff:
E - ['Harvard Medical School', 'Enveda Biosciences', 'Q152171', 'Fraunhofer SCAI']
E + ['Harvard Medical School', 'Q152171']
As you can see, the function only returns the first and the third employment entries.
I believe this piece of code is the reason:
if a["disambiguated-organization"] is None:
continue
If it doesn't find a key for disambiguated-organization - which is the case for the second and fourth entries I showed above -, it jumps to the next entry in the list, instead of returning it at the end. So, what's the reason behind this line? Is there something I'm missing here? Thanks
Dates are only parsed for education now.
e.g. Rodrigo Dalmolin
Rodrigo - https://www.wikidata.org/wiki/Q4927979 (male given name)
Dalmolin - https://www.wikidata.org/wiki/Q37464573 (family name)
From male given name, infer sex or gender (P21) --> male
Watchout for:
I believe testing is a good thing to work on going forward, since the base functionality of the code is - more or less - achieved. As we go through implementing new features and what not, it'd be a nice safety net to have an adjacent testing suite, so we don't break the basics. I'd recommend going with pytest, but Python's own testing module, unittest is pretty good too.
Sometimes a term is not on Wikidata and we'd rather just exclude it instead of going through the effort of creating a new entry.
Not sure about how to implement:
While the goal of this repo is to have a human-in-the-loop to help disambiguate affiliations and other aspects of ORCID pages, it would be nice to have a non-interactive mode that allows for a given ORCID to just do a simple set of things (instance of, occupation, ORCID annotation) and some other high-confidence resolution of affiliations.
Related to cthoyt/orcidio#7
Sometimes when we run pyorcidator import or import_list, it raises an HTTP 403 Forbidden error:
File "/usr/lib/python3.10/urllib/request.py", line 643, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
The stacktrace appears to lead to SPARQLWrapper in the helper.lookup_id function.
@cthoyt I tried to install pyorcidator as a package by running pip3 install .
in a virtual env. When I try to call the function, it gives the following error:
Traceback (most recent call last):
File "/home/lubianat/Documents/main_venv/bin/pyorcidator", line 5, in <module>
from pyorcidator.cli import cli
ModuleNotFoundError: No module named 'pyorcidator'
it is installed, though, as it is in /home/lubianat/Documents/main_venv/bin/pyorcidator
, do you know what might be happening?
Currently I'm commiting this updates directly on master. Probably not a good solution.
Maybe each curator using pyorcidator can have their own branch just for curation and they are merged e.g. once a month?
Hi @lubianat @jvfe, would it be alright if I split the code I wrote for the QuickStatements data model into its own stand-alone package? I also want to write a full-fledged client for interacting with the QuickStatements API, but this isn't the core goal of this package.
I'll keep compatibility with the current interface, so later, it would be possible to replace code in this repository with that code
Using a SPARQL query to get all subclasses of academic title (Q3529618) would be a nice way to pre-populate degrees.json
. The following SPARQL query (run at https://w.wiki/5o9H) gets the job done:
SELECT ?itemLabel ?item
WHERE {
?item wdt:P279* wd:Q3529618 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Caveats:
SELECT DISTINCT ?label ?item
WHERE {
?item wdt:P279* wd:Q3529618 .
?item rdfs:label ?label .
}
Note that DISTINCT
doesn't collapse entries tagged with multiple languages, but still have the same text.
Right now, the construction of the quickstatements text is really hard to understand, and therefore to debug or extend. I would suggest creating a data model (i.e., a set of interconnected classes) that can better assist with the construction of quickstatements in a programmatic way, then can also implement serialization to text
"role-title": "PhD"
-[] check how it is modelled on Wikidata
For example: does it add missing papers? is that optional? does it try to figure out links to website provided on the profile and extract, for example, Twitter account, GScholar account? does it link the (new) ORCID profile to existing articles?
The current code is great for hacking around, but if the end-product is a CLI app (If it isn't, disregard this), I'd recommend using something like Click, python-inquirer and rich.
Leaving these here as suggestions for the future.
Add CI check to see if the code is properly formatted with black
@jvfe about time we blackened the whole project ;)
For sure, I'll probably include black in the CI too, just to be safe. But that's for another PR.
Originally posted by @jvfe in #41 (comment)
ORCIDs catalogued on Wikidata should be updated, not newly created
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.