Giter Club home page Giter Club logo

itemsubjector's People

Contributors

dependabot[bot] avatar dpriskorn avatar futur3r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

futur3r

itemsubjector's Issues

Increase sample size with batch size

A rule of thumb could be for every 20 items - show a sample, but always a minimum of 50.
That means:
500 -> 50
1000 -> 50
2000 -> 100
3000 -> 150
4000 -> 200
5000 -> 250
6000 -> 300

for batches larger than 4000 throw a warning that the batch size is so big that you should split it up by first running --no-aliases if possible

Enable searching for alias also

As a user, I want to choose whether to search for items matching the label and or one of the aliases so I get as many hits as possible.

Pseudo code:
also fetch the aliases from WDQS
ask user for which ones to include (or all)
https://console-menu.readthedocs.io/en/latest/consolemenu/MultiSelectMenu.html
add them to a new attribute in class Labels: search_strings
fetch based on that (with one query if possible)
use https://pmitzias.com/SPARQLBurger/docs.html to generate the SPARQL query using UNION

Support dissertations also

Unable to remove articles belonging to specific subjects from the list of articles related to generic subjects

I want to add the following main subjects to the articles

  1. scoping review protocol (Q108684373): very specific
  2. scoping review (Q101116078): generic

and I run the following command (from specific topics to generic topics)

$ python itemsubjector.py -na -l Q108684373 Q101116078

Even though the addition of 'Q108684373' is complete, I see articles with the text 'scoping review protocol' in the list for 'scoping review'.

This issue may be related to Issue 14

Add more tests

  • test queries
  • TestMainSubjectItem
  • test batchjob
  • test batchjobs
  • test sparql item
  • test MainSubjects
  • test items

Issue with git+git

Running command git clone -q git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-e8o5ih20/wikibaseintegrator_728b3c0d1e3b474b9f15e676bf978aca
fatal: remote error:
The unauthenticated git protocol on port 9418 is no longer supported.
Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
WARNING: Discarding git+git://github.com/LeMyst/[email protected]#egg=wikibaseintegrator. Command errored out with exit status 128: git clone -q git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-e8o5ih20/wikibaseintegrator_728b3c0d1e3b474b9f15e676bf978aca Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement wikibaseintegrator (unavailable)
ERROR: No matching distribution found for wikibaseintegrator (unavailable)
Running command git clone -q git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-dctoaxdd/wikibaseintegrator_82363d8264d0473388254ca8bf6399e6
fatal: remote error:
The unauthenticated git protocol on port 9418 is no longer supported.
Please see https://github.blog/2021-09-01-improving-git-protocol-security-github/ for more information.
WARNING: Discarding git+git://github.com/LeMyst/[email protected]#egg=wikibaseintegrator. Command errored out with exit status 128: git clone -q git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-dctoaxdd/wikibaseintegrator_82363d8264d0473388254ca8bf6399e6 Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement wikibaseintegrator (unavailable)
ERROR: No matching distribution found for wikibaseintegrator (unavailable)

Error in PAWS on v0.3.2 when doing single subject

After installing in PAWS I ran this command:

poetry run python itemsubjector.py -a Q40858

I then selected 2 to work on Riksdagen documents. This caused this screen:

Working on naturgas, see http://www.wikidata.org/entity/Q40858
Got a total of 78 items
Please keep an eye on the lag of the WDQS cluster here and avoid working if it is over a few minutes.
https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&viewPanel=8&from=now-30m&to=now&refresh=1d You can see if any lagging servers are pooled here
https://config-master.wikimedia.org/pybal/eqiad/wdqs
If any enabled servers are lagging more than 5-10 minutes you can search phabricator for open tickets to see if the team is on it.
If you don't find any feel free to create a new ticket like this:
https://phabricator.wikimedia.org/T291621
Running 1 job(s) with a total of 1 items non-interactively now. You can take a coffee break and lean back :)
Traceback (most recent call last):
  File "/home/paws/.itemsubjector/itemsubjector.py", line 8, in <module>
    itemsubjector.run()
  File "/home/paws/.itemsubjector/src/__init__.py", line 164, in run
    handle_job_preparation_or_run_directly_if_any_jobs(
  File "/home/paws/.itemsubjector/src/helpers/jobs.py", line 154, in handle_job_preparation_or_run_directly_if_any_jobs
    batchjobs.run_jobs()
  File "/home/paws/.itemsubjector/src/models/batch_jobs.py", line 45, in run_jobs
    job.suggestion.add_to_items(
  File "/home/paws/.itemsubjector/src/models/suggestion.py", line 111, in add_to_items
    f"to {clean_rich_formatting(target_item.label)}"
  File "/home/paws/.itemsubjector/src/helpers/cleaning.py", line 24, in clean_rich_formatting
    return label.replace("[/", "['/")
AttributeError: 'NoneType' object has no attribute 'replace'

PAWS v0.3.3: no_alias_for_scholarly_items error

I get this error in PAWS on v0.3.3 when doing single subject:

Picking a random main subject
Working on naturgas
Do you want to continue? [Y/Enter/n]: 
Traceback (most recent call last):
  File "/home/paws/.itemsubjector/itemsubjector.py", line 8, in <module>
    itemsubjector.run()
  File "/home/paws/.itemsubjector/src/__init__.py", line 79, in run
    main_subjects.get_validated_main_subjects_as_jobs()
  File "/home/paws/.itemsubjector/src/models/main_subjects.py", line 108, in get_validated_main_subjects_as_jobs
    job = main_subject_item.fetch_items_and_get_job_if_confirmed()
  File "/home/paws/.itemsubjector/src/models/wikimedia/wikidata/item/main_subject.py", line 240, in fetch_items_and_get_job_if_confirmed
    return self.__fetch_and_parse__()
  File "/home/paws/.itemsubjector/src/models/wikimedia/wikidata/item/main_subject.py", line 250, in __fetch_and_parse__
    self.__prepare_before_fetching_items__()
  File "/home/paws/.itemsubjector/src/models/wikimedia/wikidata/item/main_subject.py", line 188, in __prepare_before_fetching_items__
    self.__extract_search_strings__()
  File "/home/paws/.itemsubjector/src/models/wikimedia/wikidata/item/main_subject.py", line 141, in __extract_search_strings__
    elif self.id in config.no_alias_for_scholarly_items:
AttributeError: module 'config' has no attribute 'no_alias_for_scholarly_items'

My command was poetry run python itemsubjector.py -a Q40858

Add a Web UI

Similar to QuickStatements batches, ItemsSubjector could have a flask frontend that runs in Toolforge and execute the users batches.

This requires oauth and flask.
Lucas made a good toolforge flask template to get started.

ModuleNotFoundError: No module named 'config.items'

On checking the latest version and 0.3-alpha2, I am getting the following error:

Traceback (most recent call last):
  File "itemsubjector.py", line 3, in <module>
    import src
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/tmp/ItemSubjector-0.3-alpha2/src/__init__.py", line 11, in <module>
    from src.helpers.console import (
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/tmp/ItemSubjector-0.3-alpha2/src/helpers/console.py", line 11, in <module>
    from src.models.batch_job import BatchJob
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/tmp/ItemSubjector-0.3-alpha2/src/models/batch_job.py", line 3, in <module>
    from src.models.items import Items
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/tmp/ItemSubjector-0.3-alpha2/src/models/items/__init__.py", line 10, in <module>
    from src.models.wikimedia.wikidata.sparql_item import SparqlItem
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/tmp/ItemSubjector-0.3-alpha2/src/models/wikimedia/wikidata/sparql_item.py", line 4, in <module>
    import config.items
ModuleNotFoundError: No module named 'config.items'

Support approving first and running later aka batch mode

Jean-Fred:
Run the interactive part on toolforge on the shell, and from there kick off a grid engine job ?

Dennis Priskorn:
I have not learned how the grid engine works yet.
Maybe a new flag --grid-engine can be added and then it saves the to be processed QIDs in a pickle.
Then a new script can read that and run a non-interactive batch for each one?
The latter can be executed in the engine as a job

--approve-only might be a better name

OAuth authentification error

Thanks for correcting the previous errors in 0.3-alpha3.

I checked out the latest commit in the main branch and 0.3-alpha4. And now, I now face the OAuth error.

  1. I checked with username/password
  2. I checked with botname/password
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/my_venv/lib/python3.7/site-packages/oauthlib/oauth2/rfc6749/parameters.py", line 432, in validate_token_parameters
    raise_from_error(params.get('error'), params)
  File "/mnt/nfs/labstore-secondary-tools-project/itemsubjector-jsamwrites/itemsubjector/my_venv/lib/python3.7/site-packages/oauthlib/oauth2/rfc6749/errors.py", line 402, in raise_from_error
    raise cls(**kwargs)
oauthlib.oauth2.rfc6749.errors.InvalidClientIdError: (invalid_request) The request is missing a required parameter, includes an invalid parameter value, includes a parameter more than once, or is otherwise malformed.

Any idea on this error.

I checked with other scripts of mine. There are no issues.

Cannot install in PAWS

I have checked out the latest version v0.3.1 in PAWS and try to run pip install -r requirements.txt as the README says.
I then get the error message:

ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'

Labels with apostrophe(') do not work

Labels with apostrophe(') currently do not work. I think that an escape character needs to be added, before sending the query string to WDQS.

Take for example:

Alzheimer's disease (Q11081)

returns the following error

Fetching items with labels that have one of the search strings by running a total of 11 queries on WDQS...INFO:backoff:Backing off execute_sparql_query(...) for 1.0s (requests.exceptions.HTTPError: 400 Client Error: Bad Request for url

Error on pip install

In v0.2 I am trying pip install -r requirements.txt in PAWS and get this error message:

Collecting wikibaseintegrator
  Cloning git://github.com/LeMyst/WikibaseIntegrator (to revision v0.12.0.dev5) to /tmp/pip-install-h0jhod33/wikibaseintegrator_2f94ad8cb5b244b3816e997a960745eb
  Running command git clone --filter=blob:none --quiet git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-h0jhod33/wikibaseintegrator_2f94ad8cb5b244b3816e997a960745eb
  fatal: unable to connect to github.com:
  github.com[0: 140.82.113.4]: errno=Connection timed out

  error: subprocess-exited-with-error
  
  ร— git clone --filter=blob:none --quiet git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-h0jhod33/wikibaseintegrator_2f94ad8cb5b244b3816e997a960745eb did not run successfully.
  โ”‚ exit code: 128
  โ•ฐโ”€> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

ร— git clone --filter=blob:none --quiet git://github.com/LeMyst/WikibaseIntegrator /tmp/pip-install-h0jhod33/wikibaseintegrator_2f94ad8cb5b244b3816e997a960745eb did not run successfully.
โ”‚ exit code: 128
โ•ฐโ”€> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

What should I do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.